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INTRODUCTION FROM THE VOLUME EDITORS 


The Southeast Asian Linguistics Society (SEALS) was founded in 1990 by Martha Ratliff and Eric 
Schiller in 1990. Until 2020, SEALS conferences had been convened annually for 29 years, but the 
Covid-19 pandemic resulted in sudden and dramatic restrictions on travel. The original 30" meeting 
was to be hosted by the Department of Linguistics of the University of Hawaii, but that conference was 
cancelled, like so many other academic activities. As the pandemic lingered, it was decided that rather 
than lose another year, conference video presentations would be posted, and conference proceedings 
would follow. In part due to the shift to the JSEALS journal publications, the last SEALS conference 
proceedings volume for SEALS XVI was published in 2006, making these papers the first such SEALS 
proceedings in 17 years. 


The papers in this special publication of JSEALS were written based on the video presentations of the 
SEALS 2021 conference launched online on the Ist of June. While this lacked the interactive aspects 
associated with a normal international conference, participants were enthusiastic. All the video 
presentations associated with the articles in these proceedings are listed with live links on page vii of 
this volume. The complete listing of videos and supplemental materials are available here: 
https://sites.zoogle.com/site/sealsjournal/seals-and-jseals-history/seals-xxx-online-202 1/seals-2021- 


program. 


The papers were written after the talks, and authors necessarily made modifications to their works, but 
the videos, much like handouts of a conference, provide a point of reference of the core ideas of the 
authors. This is the first time that SEALS conference presentation videos have been made available 
with the subsequent conference proceedings. 


That over 350 pages of 21 articles have been assembled and published during challenging circumstances 
around the world attests to the vigor of the field of Southeast Asian linguistics. 


Editors 

Mark Alves Paul Sidwell 
Montgomery College University of Sydney 
JSEALS Editor-in-Chief SEALS President 


SEALS Publishing Officer 


FROM THE JSEALS EDITOR-IN-CHIEF 


This is the eighth JSEALS Special Publication. The goal of JSEALS Special Publications is to share 
collections of linguistics articles, such as select papers from conferences or other special academic 
events, as well as to offer a way for linguistic researchers in the greater Southeast Asian region to 
publish monograph-length works. 

The volume contains 21 papers in total: five papers on historical linguistics, eleven papers on 
syntax and/or morphology, and five papers on phonetics/phonology. The languages covered in this 
volume are spoken in throughout the greater Southeast Asian region: Mainland Southeast Asia, Insular 
Southeast Asia, Southern China, and the Indian Subcontinent. The papers range from detailed 
descriptions of linguistic aspects of understudied languages to probing questions related to multiple 
groups of languages in the region. 

We are very pleased that JSEALS is able to contribute to the sharing of quality linguistic research 
in Southeast Asia, and we welcome and encouragae proposals for issues going forward. 


Mark J. Alves 
January 15", 2022 
Montgomery College 
Rockville, Maryland 


List of SEALS XXX (2021) Conference Presentation Videos 


The list below contains all the videos associated with the articles in the SEALS XXX conference that 
were posted online on | June 2021. The entire collection of videos are housed at the SEALS-JSEALS 
site on Google (https://sites. zoogle.com/site/sealsjournal/seals-and-jseals-history/seals-online- 

202 |/seals-2021-program) and can also be found on the SEALS 2021 Youtube page 
(https://www.youtube.com/channel/UCTT4-3HEvM_Z65tHQHXfvFw). Additionally, the 
downloadable videos in the links below have permanent DOI identifiers provided by Zenodo and have 
Creative Commons 4.0 licensing, meaning they can be freely downloaded and shared. 
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VIETIC ETYMA VERSUS EARLY CHINESE LOANWORDS 
IN THE DOMAIN OF GRAMMATICAL VOCABULARY 


Mark J. ALVES 
Montgomery College 


mark.alves @montgomerycollege.edu 


Abstract 

In this study, over a hundred and twenty Vietnamese function words with Vietic 
reconstructions and/or status as early Chinese loanwords (i.e., those borrowed before the 
Sino- Vietnamese layer associated with Late Middle Chinese character readings) have been 
assembled, grouped by subcategory, and given historical linguistic context. In this lexical 
data, most basic function words (e.g., numerals, pronouns, locative terms, time words 
related to the natural world) are native etyma, while most words with other functions (e.g., 
aspect, modality, comparison, time/aspect, etc.) are early Chinese loanwords. Despite the 
early borrowing of Chinese function words, current evidence suggests 
structural/typological changes in Vietic happened primarily in the later Viet-Muong 
period, and before that, this northern Vietic group did not undergo substantial 
morphophonological restructuring for several centuries from the time of language contact 
with Sinitic. This scenario implies significant Sinitic-Vietic bilingualism in those early 
centuries, but it also shows Vietic maintained sufficient sociolinguistic status to retain core 
function words. However, this early language contact set the stage for typological 
convergence into the second millennium and the speciation of Viet-Muong. 


Keywords: Vietic, Austroasiatic, Chinese loanwords, function words, historical syntax 
ISO 639-3 codes: vie, zhx, och, Itc, mtq, tai 


1 Grammatical features and words in Vietnamese 

Vietnamese is a Vietic language which has undergone morphophonological restructuring largely due to 
language contact with and lexical borrowing from Sinitic (and presumably Tai but with much less 
lexical exchange). This is also the case for the dozens of closely related Muong lects, together 
constituting the Viet-Muong sub-branch of Vietic. Based on historical, archaeological, and comparative 
linguistic data, the earliest period of substantial language contact between the pre-Proto-Viet-Muong 
(i.e., a stage after Proto-Vietic but before the speciation of the Viet-Muong sub-branch) speakers and 
Sinitic-speaking groups is from the Han Dynasty (202 BCE-—220 CE), with the most significant early 
contact from the first century CE. 

What was the degree of typological similarity or difference between Sinitic and Vietic at that time? 
There is a good deal of information about Sinitic linguistic structures at the time of Sinitic-Vietic 
contact. Textual data shows that Archaic Chinese had SVO structure, and modifiers preceded nouns in 
noun phrases (Aldridge 2015, Peyraube 1996). Reconstructions of Old Chinese phonology include 
presyllabic material (Baxter and Sagart’s (2014a and 2014b)) and final fricatives *-s and *-h and a 
glottal stop but no tones (Pulleyblank 1977-1978, Zhengzhang 2000, Schuessler 2007, Baxter and 
Sagart 2014a and 2014b). Old Chinese has also been reconstructed with derivational prefixes and 
suffixes (Pulleyblank 2000, Schuessler 2007:38-50, etc.), but without evidence of a high degree of 
productivity. 

Regarding this northern pre-Proto-Viet-Muong speech community, as an Austroasiatic language 
group, it most likely had a typical Austroasiatic typology and core Austroasiatic vocabulary, in addition 
to any innovations this branch had developed since the Austroasiatic dispersal. It would have had 
reduced presyllables and no tones, but final fricatives *-s and *-h and a glottal stop, and possibly 
phonemic phonation (e.g., Diffloth 1986). As for morphosyntax, Vietic had derivational prefixes and 
infixes, though no data exists to help ascertain the degree of productivity. Vietic noun phrase structure 
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has been hypothesized to have been like that of Old Khmer (i.e., head-initial structure with all modifiers 
following the head noun and no development of classifiers), SVO clause structure with a topic-comment 
tendency, and the lack of explicit passive-voice marking (see Alves 2020). 

Overall, with exception of headedness in noun phrases, Vietic shared many typological features 
with Sinitic. This was likely before (or early in the process of) major sub-branching in Vietic, and thus 
long before the Viet-Muong stage. We can also assume that this group’s grammatical vocabulary was 
entirely native at that point, including Austroasiatic retentions and Vietic innovations. 

However, from the early centuries of language contact, dozens of grammatical words were 
borrowed from Sinitic into this portion of the Vietic lexicon. While Muong lects have some early 
Chinese loanwords (hereafter, “ECLs”), Vietnamese has the largest quantity, as will be shown in this 
study. As a result, today, Vietnamese grammatical vocabulary consists of a mixture of native Vietic and 
Austroasiatic etyma and Chinese loanwords. The latter include (a) ECLs from Late Old Chinese (i.e., 
towards the end of the Han Dynasty) and Early and pre-Late Middle Chinese (i.e., before the second 
millennium CE), all of which have native Vietic phonological features, and (b) the later Sino- 
Vietnamese layer, which is associated with Chinese character pronunciations of Late Middle Chinese 
of the early second millennium. 

With an admittedly wide scope of the meaning of “grammatical vocabulary”, the focus of this study 
is on function words, or at least vocabulary with a functional nature in semantico-syntactic structure 
and with relatively abstract, functional semantic properties. This study considers numerals and quantity 
terms, pronouns and interrogative terms, measure words, and locational and temporal terms, among 
others. Such vocabulary overlaps with the borrowing of matter and of pattern (see Sakel 2007). Thus, 
while grammatical vocabulary does not on its own demonstrate structural influence and typological 
changes, such vocabulary can at the very least enhance semantico-syntactic features, as well as possibly 
impact relevant semantic systems (e.g., restructuring of the system of Vietnamese terms of address and 
reference (see Alves 2017)). 

Moreover, the borrowing of functional words can indicate the degree of intensity of language 
contact and bilingualism. It tends to correspond to the degree of structural impact, as posited by 
Thomason and Kaufman (1988:74-75). Morphophonological restructuring and some change in the noun 
phrase is what eventually occurred in the northern part of Vietic, which became Viet-Muong and from 
which Vietnamese developed. However, this was a process involving several centuries to a millennium, 
not all of which can be directly attributed to language contact with Chinese. Nevertheless, mitigating 
the perceived linguistic impact of Chinese, the lack of influence in certain subdomains highlights how 
this pre-Viet-Muong northern Vietic maintained a distinct linguistic status and identity. We will return 
to the question of ethnolinguistic implications in the conclusion. 

In previous studies (Alves 2001, 2005, 2007a, 2007b, 2009), I have explored Chinese loanwords 
in the domain of grammatical vocabulary in Vietnamese, as well as possible structural influence of 
language contact with Chinese. However, these studies incorporated limited historical phonological and 
historical syntactic data for support. In this study, regarding the historical phonological aspects, the data 
herein have been evaluated with respect to historical phonological studies of Ferlus (1982, 1992, 2014, 
etc.), Nguyén T. C. (1995), and Neguyén V. T. (2005), consideration of Middle and Old Chinese of 
Baxter and Sagart (2014a) and Schuessler (2007), as well as my own observations about the 
chronological development of tones in both Sinitic and Vietnamese (Alves 2018a). I have also 
increasingly applied ancient Chinese and Vietnamese textual data together with historical phonological 
matters (see Alves 2018b regarding the triplet ciing ‘also’, cung ‘together with’, and cong (a bound 
morph with the general sense of ‘together’), all related to Chinese ( gong ‘total’ (OC *N-k(r)on?-s, 
MC gjowngH) and Alves 2020 regarding Vietic noun phrase structure). Lastly, we now have a 
substantive study of the Austroasiatic grammatical lexicon, with Proto-Austroasiatic reconstructions 
(Shorto 2006, used for all Proto-Austroasiatic reconstructions except those marked #, which are my 
suggested reconstructions) and comparative data (Alves, Jenny, and Sidwell 2020), which serve as a 
further useful point of comparison. 

I cannot demonstrate the same depth as the latter articles for all the words in this study, but I have 
made efforts to (a) note historical phonological patterns for all selected items and (b) check ancient 
Chinese textual resources. I have periodically checked two Chinese language dictionaries of Ancient 
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Chinese words (Guhanyu Cidian: Dongfang xuesheng gongjushu (1997), Guhanyu Changyongzi Zidian 
(2005)) and Vuong Loc’s (2002) dictionary of ancient words in Vietnamese to determine that certain 
meanings and syntactic structures are attested in the past. I have also checked ancient Chinese textual 
data at the ctext.org database, focusing on the early period in question, though finding examples with 
precise meanings and syntactic distribution is not always possible. For native Vietnamese words, there 
is no textual data outside of Vietnamese-language Nom documents extending back to the 1200s. 
However, the types of native vocabulary tend to be less controversial and have higher certainty (e.g., 
numerals and basic time words) and often supported by comparative data in Austroasiatic (e.g., 
pronouns and locative words), further strengthening the certainty of chronological depth. Even stronger 
arguments can be made with more careful identification of textual evidence, but that is a long-term 
matter for after this study. 


2 Grammatical vocabulary in Vietnamese 

This study of grammatical vocabulary presents over sixty native etyma and about sixty ECLs. Thus, 
these words have a history extending from the second millennium BCE before the Han expansion into 
northern Vietnam into the first several centuries of Sinitic-Vietic contact in the first millennium CE. 

Table 1 provides very general time periods based on historical and archaeologically attested 
periods. This lasts from the Austroasiatic dispersal hypothesized to be about 4000 BP into the mid-first 
millennium CE. Regarding contact with Sinitic, that period lasts from Late Old Chinese through Middle 
Chinese before the Late Middle Chinese stage. The key Chinese dynasties when historically 
documented early migrations of Chinese arrived in northern Vietnam include the Han Dynasty (206 
BCE-220 CE), the Jin Dynasty (266-420 CE), and undoubtedly more arrived in following centuries, but 
as yet, I have seen no other supporting textual evidence with details. It is also presumably leading into 
the period of development of the hypothetical Annamese Chinese, as per Phan (2013). The descendants 
of speakers of this variety of Chinese ultimately shifted to Viet-Muong in the early second millennium. 
Chinese function words borrowed from the turn of the second millennium CE and the emergence of 
Viet-Muong (i.e., dictionary Chinese character readings) are not part of this study. 

Regarding the Annamese Chinese hypothesis, whether a specific Chinese dialect emerged in this 
area—parallel to the emergence of Chinese dialect groups elsewhere in southern China—cannot be 
known. Nevertheless, the grammatical lexical data in this study could not have been borrowed without 
a large enough Chinese-speaking community embedded within this pre-Proto- Viet-Muong community. 


Table 1: Stages from Austroasiatic to Vietic 


Stages Time Periods 

1. Austroasiatic dispersal c. 2000 BCE 

2. Development of Vietic as a distinct group and early contact with | Between 2000 and 1 BCE 
Kradai (pre-Proto-Tai?) and early but limited contact with Sinitic 
3. Initial substantial Sinitic-Vietic contact and development of local | First several centuries of 
Annamese Chinese (Late Old Chinese to Middle Chinese) first mill. CE 

4. Development of Viet-Muong and shift of Annamese Chinese to Early 2"¢ mill. CE 
Viet-Muong (Late Middle Chinese) 


Analysis of subcategories of ECLs allows for historical linguistic inferences from the pre-Proto-Viet- 
Muong period. Native proto-language etyma include native numerals (‘one’ to ‘ten’), core pronouns 
(first-, second-, and third-person singular, second-person plural, and determiners), question words 
(‘who’, ‘what’, ‘where’), several locational terms, and core time words. Thus, the grammatical terms 
which are reconstructable to an early stage of Vietic are genuinely basic vocabulary (e.g., pronouns and 
numbers) or relatively more basic (e.g., interrogative words and location words). Some of these have 
central roles in clause structure. 

ECLs in Vietnamese, in contrast, are less central to clause structure but instead have other 
semantic-syntactic functions (e.g., modal verbs, conjunctions, unit terms) or even have a cultural nature 
(e.g., calendar terms). ECLs fall into several categories, including a handful of locative, measure, and 
numeric terms, but a notable number of clause-connective and preverbal auxiliary words, as well as 
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time words related to the Chinese zodiac calendar. Grammatical ECLs in Vietnamese are mostly not 
seen in Vietic languages other than Muong, but they nevertheless represent borrowing in a relatively 
early stage of Vietic history, at least the predecessor to Viet-Muong. Vietnamese and other Vietic 
languages have numerous additional Chinese loanwords from the Late Middle Chinese period, but these 
were borrowed after the emergence of Viet-Muong and a stage in which Vietnamese was distinct. Thus, 
those true Sino-Vietnamese loanwords are not relevant to the sociocultural impact in first part of the 
first millennium CE. 

The matter of identification of ECLs requires explanation. Sino-Vietnamese vocabulary stemming 
to the Late Middle Chinese period can be readily identified since such words are listed in dictionaries 
as Chinese character readings. In contrast, ECLs must be identified through the comparative method, 
and there generally cannot be absolute certainty regarding such proposed loanwords. The ECLs in this 
study have been assessed for their phonological, semantico-syntactic, and sociocultural viability, and 
based on these factors, they have been evaluated with medium to high certainty of ECL status. Many 
other candidates have been excluded, and some items in this study may later be excluded as data is 
further evaluated. But in general, considering the consistency of the phonological patterns and quantity 
of words in semantic domains which provide reinforcing support for certainty, a large majority of these 
are strong candidates as ECLs. Thus, though some details herein remain tentative, the broader claims 
(e.g., sufficient Sinitic-Vietic bilingualism to facilitate substantial borrowing of grammatical words in 
this early period) must be considered viable hypotheses unless strong counterevidence shows otherwise. 

Additional clarification of these words’ histories is provided through Muong lexical data found in 
the substantive Muong Bi dictionary of Nguyén V. K. et al. (2001). Comparable Muong words are 
provided in various tables in this paper to show the presence of probable ECLs in Viet-Muong, not only 
Vietnamese, thus supporting the dating of these items to that early period, but data from more Muong 
varieties is needed. While Vietnamese cannot be ruled out as a donor language in some of the Muong 
words, others have phonological features that mark them as predating the modern era. Phan (2013:320- 
321) has also presented some data of Sino-Vietnamese grammatical morphemes shared by a few 
varieties of Muong. He similarly noted phonological features of these words to suggest that these were 
indeed borrowed directly from Annamese Chinese, rather than through Vietnamese. He further proposes 
that these predate the speciation of Viet-Muong. While this is one reasonable hypothesis, another 
possibility is that they were borrowed variously before and after speciation. Regardless, at least some 
are genuine ECLs, which supports the presence of a Chinese-language community and widespread 
bilingualism with Viet-Muong in the first millennium. 

However, as will be shown, many grammatical ECLs in Vietnamese do not appear in the Muong 
Bi data, suggesting a different type and degree of language contact with Chinese. Also, though Nguyén 
V. T.’s (2005) 1,200-word list for some 30 Muong varieties does contain several grammatical ECLs 
listed in this study, in various instances, not all varieties of Muong are shown to have those ECLs. For 
example, while all the Muong varieties have cognates for the Vietnamese ECL kha ‘rather/very’ (see 
Table 20), Muong cognates for Vietnamese ECL cing ‘also’ (see Table 19) are in only 10 of the 30 
Muong lects, with another form /i/ in the other 20 (Nguyén V. T. 2005: 230 and 203 respectively). Thus, 
we can hypothesize that different groups of Viet-Muong had different periods of contact with Chinese 
in different locations, perhaps during the time in which Vietnamese and varieties of Muong were 
differentiated. Nevertheless, the idea that Viet-Muong, as opposed to other Vietic groups, shared a more 
intense degree of contact with Chinese—thereby spurring the speciation of Viet-Muong as a Vietic sub- 
branch—is supported by the larger quantity of Chinese loanwords into Vietnamese and varieties of 
Muong than other sub-branches of Vietic. 

Finally, a more thorough study of these proposed ancient loanwords would require careful 
identification of relevant developed senses, functions, and distributional patterns of grammatical words 
in ancient Chinese texts. However, in most cases, the semantics are straightforward and either have 
been retained in Chinese into the modern era or, in a few cases, are older senses that can be readily 
located in references on ancient literary Chinese. For example, Vietnamese khd ‘rather/very’ has a sac 
rather than Adi tone marking it as an ECL, and the Chinese source 4] ké ‘able’ has, among other 
functions, precisely this preverbal intensifier position and semantic function and did so also at least as 
early as the Tang Dynasty (618-907 CE). In some instances, some ECLs were not originally 
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grammaticalized throughout Sinitic but became grammaticalized in Vietnamese, and sometimes 
Muong. Such cases are noted and of course cannot be considered direct influence of Chinese, but they 
are, nonetheless, ECLs with grammatical meanings. They could also potentially represent developments 
of the Annamese Chinese speech community in the region in the first millennium CE, in which case, 
some of these could borrowed grammatical words, but then the timing and source of the 
grammaticalization would become harder or even impossible to determine. 


3 Categories of function words Vietic and Early Chinese loanwords 

This section covers Vietnamese function words and their etymological origins. Table 2 presents an 
overview of the eight categories of grammatical words and the numbers of words in the two categories 
of Vietic etyma and ECLs. As is shown, pronominal, quantity, and locational words are predominantly 
native etyma. ECLs are dominant in the categories of measure words and words with conjunctive, 
comparative, and modal functions. Thus, these are words with potential significance to or even impact 
on phrase structure in northern Vietic. The number of time words is similar in both columns, but native 
words are core time words (e.g., ‘day’, ‘month’, ‘year’, etc.). ECLs in this domain are secondary (e.g., 
‘hour’, ‘now’, ‘turns’, ‘instances’, etc.) or are related to the sociocultural system of the Chinese zodiac 
calendar. The total number of grammatical ECLs in Table 2 is 62. The number is 73 when Chinese 
zodiac calendar terms, which are cultural content words rather than function words, though with 
semantic overlap of these with temporal function words, so they are noted herein. 


Table 2: Categories and numbers of Vietic and ECL items in Vietnamese 


Category Vietic Etyma Early Chinese Loans 
1. Pronouns and interrogative terms 11 0 

2. Numbers and quantity expressions 18 9 (mostly quantity terms) 
3. Locational words 12 5 

4. Time words 11 12 (and 11 calendar terms) 
5. Measure words and units 8 17 

6. Conjunctive terms 1 5) 

7. Comparative terms 1 8 

8. Modals and preverbs 1 6 

Total 63 62 (or 73) 


A few ECLs have spread widely throughout Vietic, enough to be reconstructable, as shown in Table 3. 
The timing of the borrowing is obviously not to the proto-Vietic level but rather sometime in the first 
millennium CE. Vietic languages have borrowed grammatical words from Vietnamese recently, as 
shown by their phonological traits (e.g., the Sino- Vietnamese passive marker bj in Chut languages), and 
thus cannot be valid reconstructions. However, the phonological features of the words in other Vietic 
languages listed in Table 3 mark them as earlier loans, though whether directly from Chinese or another 
Vietic language cannot be known. Regardless, this small number of items highlights the spread of ECLs 
in Vietic many centuries ago but also the limited impact on function words outside of Viet-Muong. 


Table 3: Grammatical ECLs which are reconstructable in Vietic 


Gloss Viet. | SV_ | Proto-Vietic | Muong Chinese OC MC 
(classifier) cai ca #kaj? cai {El gé ‘piece, item’ | *k‘a[r]-s kaH 
many/much | nhiéu | nhiéu #new (tt) | 2 rao ‘abundant’ *Inlew | nyew 
pair doi doi *torj tdi 4 dui ‘pair’ *[t]}[ujp-s | twojH 
side bén | bién #pe:n pén és bian ‘side’ *pte[n] pen 


Another aspect considered in this study is borrowability rates. WOLD (the World Loanword Database) 
lists borrowability rates of classes of words, providing criteria for a select list of basic vocabulary, the 
Leipzig-Jakarta 100-word list (Tadmor et al. 2010), which is relatively resistant to borrowing. For this 
study, the statistical range of borrowability rates are interpreted in a relative manner: one can posit very 
low (e.g., below 0.10, typical for items in the Leipzig-Jarkarta list), somewhat low (from 0.10 to 0.25), 
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and medium or higher rates. These are impressionistic ranges, but they are based on ranges of rates 
found in WOLD statistics. Where necessary in subsequent subsections, I provide interpretation of the 
borrowability rates seen in subgroups of words. 

This study secondarily explores the matter of relatively more basic vocabulary within the domain 
of grammatical vocabulary, as demonstrated precisely in the kinds of native retentions in Vietnamese 
versus ECLs. In this way, the data from WOLD gives some perspective regarding the etymological 
sources of the words, and while WOLD statistics cannot be considered strong support for claims of 
loanword status, the statistical tendencies in WOLD provide another angle to consider and should be 
used as a point of reference in establishing word origins. The following subsections of types of 
grammatical words are organized as they are in Table 2. 


3.1 Pronouns and Interrogative Terms 

Vietnamese pronouns and interrogative terms highlight the native Vietic and Austroasiatic origins of 
the Vietnamese lexicon. In the data of WOLD, among the forty-one languages, these terms have very 
low borrowability rates, 0.07 or lower, as in Table 4. Such rates match rates of other basic vocabulary 
in the Leipzig-Jakarta wordlist, such as ‘you (2s)’ at 0.04, ‘this’ at 0.00, and ‘who’ at 0.03. 


Table 4: Borrowability rates of pronouns and referential terms 


Borrowability Rates Categories 
0.00 - 0.07 Personal pronouns 
0.00 - 0.06 Interrogative, information-question words 
0.00 - 0.03 Demonstratives/Deixis 


The modern Vietnamese system of pronouns has been somewhat restructured from the original Vietic 
system (which is admittedly only partially reconstructable). There is notable impact on the overall 
system and related pragmatics, resulting directly and indirectly from language contact with Chinese 
(see Alves 2017). Despite this impact stretching out over several centuries from the first to second 
millennium CE, all pronouns and interrogative terms in this data are almost exclusively native 
retentions. The native pronouns in Table 5 include first, second, and third-person singular pronouns and 
the proximal demonstrative. 


Table 5: Proto-Vietic pronouns 


Gloss | Proto-Vietic | Proto-Austroasiatic | Vietnamese 
1s *so: NA tao 
2s *mi: *mi[1]?; *miih may 
2p #baj *pej bay 
3s *han? *[?]an? han 
3s #na:? *nV? no 
this ni: *ni?; *nih nay, nay 


The data in Table 5 deserves comments. While the second and third-person Proto-Vietic forms are 
Austroasiatic etyma (again, from Shorto 2006 in this table and all others), the first-person form is an 
apparent Proto-Vietic innovation of unknown origin. Not listed in Table 5 is the only pronominal ECL, 
Vietnamese ho, which is a third-person plural pronoun connected to Chinese F hi ‘household’ (OC 
*m-q'a?, MC huX). However, this grammaticalization represents an innovation in Vietnamese, so this 
ECL might not have been borrowed as a pronoun. Only further exploration in N6m texts can determine 
whether any evidence can elucidate the chronology of grammaticalization of this word. It is not used 
this way in Muong Bi, which instead uses pau ‘they’ (nor is this item seen in descriptions of other Vietic 
languages), but data for this word in other Muong lects is not readily available. 

The form for the proximal ‘this’, Vietnamese ndy, is comparable to a form with complex regional 
distribution. This form is seen in Austroasiatic (*ni?; *nih ‘this’), Austronesian (*i-ni ‘this/here’), 
Kradai (*naj°), and Cantonese nei’/ni’ WE ‘this’ (but not widespread among other varieties of Chinese 
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or even Yue). Regardless of any possible deeper historical origins, it must be treated first as an 
Austroasiatic etymon and a lexical retention, not a loanword. Other determiners in Vietnamese 
constitute an phonaestheme-like analogical pattern of initials and rhymes (see Thompson 1985:142), 
making etymological identification challenging. For example, words with initial /d/ (do ‘that’, day 
‘there’, dau ‘where’) may be related, but there is currently no means of determining their etymological 
origins or derivational sources. 

All Vietnamese interrogative words in Table 6 are native etyma. However, the terms m6 ‘where’ 
and chi ‘what’ are central Vietnamese dialect words that have different counterparts of unknown 
etymological origin in northern and southern dialects (dau ‘where’, as just noted, and gi ‘what’). Central 
Vietnamese has retained some archaic traits in its grammatical vocabulary, including some of those 
words’ phonological features, while northern Vietnamese innovated in other ways (Alves 2012). Some 
Vietnamese interrogative terms are bisyllabic compounds, which cannot be reconstructed to the distant 
past. The compounds sometimes contain a mixture of etymological sources (e.g., ‘why’ tai sao (at.how) 
with a Sino-Vietnamese syllable (4£ zdi ‘at’) and a native etymon; ‘when’ bao gio’ (how much.time) 
with a native etymon bao and an ECL gio’ (see in Table 14)). 


Table 6: Proto-Vietic interrogative words 


Gloss Proto-Vietic | Proto-Austroasiatic Vietnamese 

how many *bal? NA may 

what #cl: *(2)c1? chi (dialectal) 
where #mo: *m(o)? mo (dialectal) ‘where/what’ 
where/which *_mo: NA mé (dialectal) 

who *e: NA ai 


3.2 Numbers and quantity expressions 
Compared to the low borrowability rates of pronouns, rates among number words are noticeably higher, 
with numerals two to ten having rates of 0.21 to 0.29. Numeral term systems are sometimes shared 
among languages, and in this region, language contact with Chinese has significantly affected the 
numeral systems of Tai (and Japanese and Korean). Nevertheless, numerals must be considered in 
studies of affiliation in language families. In WOLD, other quantity expressions vary in borrowability 
rates, but in general, they range from 0.09 to 0.23, only somewhat higher than truly basic vocabulary. 

Like pronouns and interrogative terms in Vietnamese, core numeral words are native etyma, either 
stemming to Vietic or Austroasiatic. The overall situation of numerals in the Austroasiatic language 
family suggests developments of numeral terms above four after the Austroasiatic dispersal (Sidwell 
nd), but still, all Vietic numerals have cognates in multiple branches of Austroasiatic, as noted for the 
terms for five to nine in Table 7. While Chinese was clearly a donor language for many cultural 
elements, including measure words, as described in Section 2.5, the retention of native numerals over a 
thousand years of language contact and a bilingual Sinitic-Vietic community is evidence of Vietic’s 
sociocultural status in the first millennium CE. Otherwise, we could expect to see a situation like that 
of Tai, in which core numerals have been replaced with the Chinese system. These include the Proto- 
Tai numbers for two to ten, and then through lexical compounding, to ninety-nine. In addition, in a list 
of lexical data for sixteen Kradai lects (Liang and Zhang 1996:44-45), most have forms clearly 
stemming to Chinese numeral terms for 100, 1,000, and 10,000 (with Thai as a notable exception, 
having other etymological sources for 100 and 1,000). This is very different from the circumstance in 
Vietnamese and Vietic broadly. 

An unanswerable question is how much—if at all—the Annamese Chinese community adopted 
Vietic numerals in their daily speech in a bilingual community. Regardless, other than some very limited 
circumstances noted below, Vietic speakers did not predominantly use Chinese numerals. 


Table 7: Proto-Vietic numbers 
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Gloss Proto-Vietic Proto-Austroasiatic Vietnamese 
one *mo:c *muuj mot 
two *ha:r *6aar (unlikely related to the Vietic form) | hai (cf. vai ‘a few’) 
three *pa: *pi? (#pe:) ba 
four *po:n? *pun? bon 
five *dam NA (cf. Bahnaric, Monic, Aslian) nam 
SIX *p-ru:? > pru:? / klu:? NA (cf. Bahnaric, Monic, Aslian) sau 

seven *pas NA (cf. Bahnaric, Monic, Aslian) bay 
eight *sa‘m? NA (cf. Bahnaric, Monic, Aslian) tam 
nine *cim? *dciin[?] chin 
ten *juk NA chuc 
ten *mail NA muoi 

hundred *k-lam NA tram 
few *Prt NA it 
more #do:h NA nita 
half #CVda:h NA nita 


There are only a few Vietic general quantity terms in Table 8, but these express significant quantitative 
senses (‘few’ and “more’). None are Austroasiatic etyma, but the fact that these are all reconstructable 
in Vietic suggests they could have been used before Sinitic-speaking groups arrived. 


Table 8: Proto-Vietic quantity words 


Gloss Proto-Vietic | Proto-Austroasiatic | Vietnamese 
few *O1:t NA it 
more/additional #da:h NA nita 
half #CVda:h NA nua 


As for ECL quantity terms, there are several quantity expressions and a few numerals with very 
restricted usage. The impact of Chinese quantity expressions on pre-Viet-Muong and Vietic broadly is 
small but still notable, including ‘every’ and ‘much/many’. The Chinese words ‘pair/twins’ and 
‘many/much’ have spread widely throughout Vietic, as noted in Table 2. The numbers that have been 
borrowed have little to no usage in modern Vietnamese, and it is unclear exactly how these scattered 
items were used in the pre-Proto-Viet-Muong period. The number 10,000 was not likely to have been 
useful in daily interaction, except with a metaphorical meaning, as in the sense ‘crowded/numerous’ in 
Table 9 (muén has phonological features showing it to be an older loanword than van). Also, the ECLs 
for ‘two’ (with different tones suggesting multiple periods of borrowing) and ‘four’ have no widespread 
usages in modern Vietnamese, and there are no other apparent ECL numerals. 

The entire set of Chinese numerals is available in the later Sino-Vietnamese pronunciations, but 
these are used primarily in literary writing or in limited compounds, not as free morphemes. Again, as 
noted above, Vietnamese has retained all core native number terms, so while the system of quantity 
terms was modified somewhat in the first millennium, the Vietic numeral system has received no 
apparent influence. 
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Table 9: Early Chinese loanwords for quantity terms and numbers 


Category | Gloss ECL SV_ | Muong | Chinese OC MC 
general | enough du tlic; tu tu Je zu *[ts]ok-s | tsjowk 
every moi moi (thay) | #&§ méi *m'o? mwojX 
much, many nhiéu nhiéu ( tur ) fe rdo *[nlew nyew 
lacking/little thiéu thiéu thiéu | /?shdo | *[s.tlew? | syewX 
pair doi doi t6i “f dui | *[t}[ulp-s | twojH 
number | two nhi, nhi, nhi | nhi NA — ér *ni[j]-s nyijH 
four tu tte NA PU si *s lifj]-s sijH 
ten thousand muon, van muon Hwan | *C.ma[n]-s mjonH 
crowded, numerous van van muon & wan | *C.ma[n]-s | mjonH 


3.3 Locational words 

The borrowability rates of the two dozen locative words in WOLD can be divided into two main 
categories: those with rates below 0.10 and those with rates above 0.20. Rates of basic locational and 
directional terms with core cognitive linguistic properties are at the lower end (e.g., ‘up’ 0.0, ‘above’ 
0.03, ‘down’ 0.05, ‘far’ is 0.06, etc.). In contrast, cardinal direction terms have notably higher 
borrowability rates of 0.20 or higher (i.e., ‘north’ 0.20, ‘south’ 0.26, ‘east’ 0.23, ‘west’ 0.24). This 
situation largely matches that of Vietnamese locational terms, with some exceptions. Most locational 
and directional terms in Vietnamese are native etyma, while cardinal terms are all Chinese loanwords 
from the later Sino-Vietnamese layer, not ECLs. However, a few locational terms were borrowed from 
Chinese in the early, pre-Sino- Vietnamese period. 


Table 10: Proto-Vietic location words 


Gloss Proto-Vietic Austroasiatic Vietnamese | Muong 
after #k'saw *kraw? sau khau 
before/forehead *k-lack NA truoc NA 
end/extremity > nipple | *go:j? / ko:j? NA cudi cudi 
far *s-naij? *[c]naj? ngdi (dialectal) (xa) 
inside *k-lo:n *kluuy ‘belly/middle’ trong tlong 
left side *k-la:j? NA trai tlai 
middle #Csah NA gitta khita 
right side *dam / tam *stiam; *stjuum dam (archaic) tam 
stay/be at *?oh NA or o 
top (upstream?) *k-lesn NA trén tliénh 
top/crown *nom? NA ngon ngon 
under *-ta:l? *ktyaal duodi tin 


Several of the Vietic reconstructions in Table 10 are Austroasiatic etyma (e.g., ‘after’, ‘under’, ‘inside’, 
‘right side’, and ‘far’), giving these items truly deep history. Both ‘far’ and ‘right side’ are not used in 
modern mainstream Vietnamese, but they still demonstrate native etymological connections. As for 
‘inside’, it is a possible grammaticalized form of ‘belly’, a noted grammaticalization cline (i.e., BELLY 
> IN (Heine and Kuteva 2002:53)), but it may not be possible to determine the timing of 
grammaticalization in Vietic. There is a comparable form in Chinese 7 zhong ‘middle’ (OC *trun, MC 
trjuwng) considering its phonetic shape, including a source *u which often leads to Vietnamese ‘o’ /9/, 
and reasonable semantic overlap. However, I propose we consider native sources first as a default unless 
stronger evidence can show an external source is more likely. 

The number of locative ECLs in Vietnamese is smaller than the number of native terms, but these 
ECLs have significant semantico-syntactic functions. A general locative term for ‘side’ (in contrast with 
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‘left’ or ‘right’, which are native terms in Vietnamese) was borrowed. There is an odd pattern, for which 
I see no explanation: the borrowing of only one item of semantic pairs, including ‘near’ but not ‘far’, 
‘from’ but not ‘to’, ‘outside’ but not ‘inside’, and ‘bottom’ but not ‘top’. Overall, the locative ECLs 
unquestionably impacted parts of this lexical system in that early period, but a core native system is still 
mostly in place. 


Table 11: Early Chinese loanwords for locational terms 


Gloss ECL | SV_ | Muong | Proto-Vietic | Chinese OC MC 
bottom day | dé day NA Jeg di *t5ij? tejX 
from tie te | (po) NA Sai | *s[blift-s | dzijH 
near gan can | khénh *t-kon Fr jin *N-kor? gjt+nH 
northern (of wind) | bac | bac NA NA Jk béi pok *pok 
outside ngoai | ngoai | wai NA Sh wai | *[y]a[t]-s | ngwajH 
side bén | bién pén #pe:n 7 bian *pSe[n] pen 


As for possible syntactic influence, it seems such ECLs did not impact Vietic prepositional phrase 
structure. While the use of prepositions, not postpositions, in Vietic has not been shown conclusively, 
all evidence suggests that Vietic had right-branching structures in noun phrase and clauses. Because 
both Vietic and Sinitic were prepositional languages, loanwords for ‘from’ and ‘near’ could have been 
readily fit into Vietic structures. However, for Sinitic locational nouns, such as ‘outside’ and ‘side/the 
location of’, the words take preceding modifiers, such as adjectives, possessives, and determiners (e.g., 
Chinese wil wai |} (house-outside) ‘outside the house’ (found in both early and modern Chinese), 
in which ‘house’ precedes the locative noun). This is unlike Vietnamese and other Vietic languages 
with post-noun modifiers (e.g., Vietnamese ngodi nha (outside-house) ‘outside the house’, in which 
‘house’ follows). Thus, not only did Sinitic not impact Vietic phrase structure related to locational 
concepts; some of these locative words were also adapted to the typologically distinct Vietic syntactic 
patterns. 


3.4 Time words and calendar terms 

Table 12 presents three levels of borrowability rates of time words, focusing on the terms in this study: 
low (0.01-0.19), medium (0.22-0.45), and high (0.49-0.76). The lowest level consists of terms that are 
identifiable in the natural world (e.g., ‘day’ and ‘year’), while those at the upper end of the range are 
arbitrarily defined and culturally specific (e.g., ‘hour’ and ‘week’). As might be expected, those in the 
lower range are native in Vietnamese, as seen in Table 13, while those at the upper end are ECLs, as in 
Table 14. Those in the middle category include both native etyma and ECLs. 


Table 12: WOLD borrowability rates of time words 


Rates Subcategories 

0.01 to 0.19 | e Terms related to ‘days’ (day, today, tomorrow, etc.) 

Parts of the day (morning, midday, evening, night) 

Month, year 

Seasons (spring, summer, autumn, winter) 

0.22 to 0.45 | e Various (the beginning, immediately, always, ready, the season) 
0.49 to 0.76 | e¢ Hour, week, days of the week 


A few of the Vietic time words in Table 13 are also Austroasiatic etyma. The words ‘day’ and ‘year’ 
are truly basic as they are observable natural phenomena. In contrast, ‘about to’ is a probable 
grammaticalized word derived from ‘ready/prepare’, and while the word form is reconstructable in 
Vietic, it is not possible to determine when the grammaticalization occurred. However, the etymon with 
the aspectual function is seen in both Muong and the conservative Vietic Ruc language k"rap’ (Nguyén 
V. L. 1993:100), both with retained early onset, suggests some time depth of this development). All 
other time words in Table 13 have Proto-Vietic reconstructions, including the parts of the day and two 
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additional aspectual terms. The latter two, like ‘about to’, are likely grammaticalized from verbs, though 
ideally, there will be textual data in early Vietnamese N6m writing to clarify the timing of such 
developments. 


Table 13: Proto-Vietic time words 


Category | Gloss Vietnamese | Proto-Vietic Proto-Austroasiatic 
Units day ngay ie *tyil? 
month thang *k-ra:n? NA 
year nam *c-n-om < com *cnjam 
day after tomorrow mot #CVmu:t NA 
Parts morning som *k-ro:m? NA 
of midday/noon tua *k-la: NA 
days dark/evening toi *su:l? NA 
night dém *te:m NA 
Adverbials | right/immediate/just ngay *t-nar NA 
finished/complete xong #60: NA 
about to/prepared to sap #srap < k"rap (2) | *srap ‘ready/prepare’ 


The ECLs in Table 14 include a range of adverbials and units of time which do not have a natural world 
point of reference, as the native etyma do. Instead, these adverbials primarily function as 
aspectual/temporal markers for past, present, or future time frames. Other than ‘hour’, the unit terms 
have a somewhat more abstract sense of an instance of time. The grammaticalization paths of all of 
these words will require more careful sifting of both early Chinese and Vietnamese textual data. 
Nevertheless, the phonological features of these mark them as loanwords in that early period. 

Of note is the Chinese word H# shi, which is listed twice in Table 14. Thus, the loanwords constitute 
a triplet with two ECLs and a corresponding Sino-Vietnamese form. In Chinese, this morph has a wide 
range of meanings, including ‘time (in the general sense)’, ‘currently’, ‘hour’, among others. 
Correspondingly, this triplet in Vietnamese expresses multiple meanings: (a) gio’ ‘time (in the general 
sense)’, ‘now’, and ‘hour’, (b) chir ‘now’, and (c) thi/thoi ‘time (in the general sense)’. The Sino- 
Vietnamese syllable thi is a later Sino-Vietnamese borrowing, while gio’ is probably the oldest, 
considering the affricate onset. The timing of the borrowing of the ECLs is less certain, but their onsets 
and vowels show they are both ECLs. The Sino-Vietnamese morph thi is also a possible source of the 
homophonous high-frequency topic marker thi ‘then’, becoming more dominant over the ECL bén 
‘then’, as discussed in Section 3.6. 

Also of note is Vietnamese dang ‘during’, of which some aspects weaken my claim of its ECL 
status. The phonological form is reasonable, including the retention of *a from the early period of 
borrowing, in contrast with the diphthongization in the Sino- Vietnamese form. However, the use of this 
word in modern Chinese is as a kind of adverbial clause marker, meaning ‘when’, not as a preverbal 
form in Vietnamese (and I cannot find syntactic descriptions of this in early Chinese texts). 
Meinsterernst (2011) shows how, in Pre-Tang Chinese, ‘& dang had multiple functions, including as a 
temporal and locative preposition, effectively “at” (in addition to a verb meaning ‘to match/correspond’ 
and modal expressing necessity). Thus, if this is indeed an ECL, there was a reinterpretation with impact 
on its semantico-syntactic features. Another issue is that some Tai languages have a comparable form. 
The progressive sense for a form [daan] is seen in several central and northern Tai languages (Gedney 
2008:115), though this is homophonous with ‘body’, a common source in grammatical clines. There is 
a Malay/Indonesian form sedang, which has a similar semantico-syntactic function and distribution, but 
considering the presyllable, this seems more likely to be a chance partial similarity. As for deeper 
historical origins, one word in isolation contributes nothing of value, and without wider distribution in 
Austronesian (perhaps instead grammaticalization from the homophonous word meaning ‘average’?), 
it does not even have relevance to Sagart’s Sino-Austroasian hypothesis. Regardless, the word would 
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be a likely ECL into both Viet-Muong and Tai. Hopefully, additional ancient Chinese and/or 
Vietnamese textual data can clarify the matter. 


Table 14: Early Chinese loanwords for time terms 


Category | Gloss ECL SV Muong Chinese Oc MC 
Adverbs always xang thuong NA # chang *[djan dzyang 
(redup.) 
during dang duong tang ‘= dang *San tang 
ever tung tang, tang NA céng *Tdz}‘on dzong 
just about to chuc truc ( khap ) zhi *N-t<1r>0k drik 
now chir thi (ca ni, He shi *Id]o dzyi 
chi ni) 
previous/old xua SO ho #J] chit *[ts]hra tsrhjo 
then bén tién (de é) {8 bian *ben-s bjienH 
Units a turn lan ludn lan 5 in *TrJu[n] Iwin 
batch; time/turn luot liét luot Fill lie *[r][e]t ljet 
hour (and ‘now’ gid’ thi do HE shi *Td]o dzyi 
and ‘time’) 
period of time; chung trinh NA f= chéng *I<r>en drjeng 
level 
time/instance phen phién, (ca) Ay fan *phar phjon 
phan 


There are other ECLs related to time, but not fully grammaticalized ones. Vuong (2002) notes a number 
of Chinese loanwords (what he sometimes labels “nativized” words, likely following Wang Li (1948, 
1958), though no phonological traits distinguish them from ECLs), such as thot ‘measure word’, van 
‘numerous’, birop ‘lack’, tua ‘must’. A few of these in Table 15 are ECLs which refer to punctuality. 
Overall, among ECLs, there is a domain of terms with reference to both grammatical issues of time 
(e.g., aspect) and timeliness (i.e., being late or on time), altogether supporting the idea of a bilingual 
language contact situation, one with a broader sociocultural impact. 


Table 15: ECLs with general reference to timeliness 


Gloss _| ECL | SV_| Muong | Chinese OC MC 
gradual dan | tuan dan Bll xin *7win *so. lu[n] 
intime | lép | cap lap Ke ji | *[m-k-]rap gip 
intime | kip | cdép | (lap) Ke ji | *[m-k-]rop gip 

late | chay | tri | (mudn) | #Echi | *1<rofj] drij 
urgent | kip | cap NA se jf *Tk]@eop kip 
ECL Calendar Words 


Calendar terms are abstract cultural words rather than function words. However, considering that (a) 
they overlap in function with other time words, (b) they constitute a semantic system, and (c) there are 
a dozen ECLs in this category, they deserve attention. The core concept of the Chinese New Year is an 
ECL, tét, as noted in Table 16. As for seasons, Vietnamese has borrowed all four terms for seasons, but 
only ‘summer’ belongs to the category of ECL, while the other three are of the later Sino- Vietnamese 
stratum. The only ECL time word for a part of the day is for ‘noon’ (though the native ¢wa ‘noon’ is 
the dominant one in modern usage). This word is listed as the standard Sino-Vietnamese pronunciation, 
suggesting a Late Middle Chinese borrowing, but its vowel (i.e., ‘o’ which is statistically mostly 
associated with native words, instead of the expected ‘6’) and tone (i.e., nang instead of the expected 
nga) suggest it was borrowed in an earlier period and retained as a formal Chinese character 
pronunciation. 
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The Chinese calendar system, along with its many names of years and months, is more complicated 
than what can be described here. Readers can readily locate sources listing these forms in various 
recipient languages in the region. Lexical evidence demonstrates that the Vietic spoken in northern 
Vietnam after Chinese groups settled borrowed this calendar system early in the first millennium CE, 
as demonstrated by the phonological features as well as the spread of the system to other languages in 
the region. Based on the Chinese system of twelve earthly branches with animal terms, Vietic speakers 
developed their own version with several native etyma.' This system subsequently spread to Old Khmer 
(Ferlus 2013). It is this nativized Viet-Muong system that is in both Vietnamese and Muong (see Ferlus 
2013), while I can find no source of these original ECLs in available Muong sources. Regardless, we 
can hypothesize that the original Chinese system, and thus the ECLs in Table 16, were borrowed before 
that second native system was developed. 

What is notable in Table 16 is that there are only six ECLs of the twelve Chinese year terms, and 
of these, the first four are formally listed in dictionaries of Chinese character readings (an interesting 
detail as others were replaced by later-stage pronunciations). However, all four have clear ECLs 
phonological features, notably tones, which place these borrowings in the first half of the first 
millennium. The reason for these seeming scattered replacements of earlier forms is unclear. 


Table 16: Early Chinese loanwords for Chinese calendar terms 


Gloss ECL SV Muong Chinese Oc MC 
new year tét tiét thét Efi jié *tstik tset 
noon ngo (SV) ngo (ca ntangay)| 4- wi | *[m].qha? | nguxX 
summer he ha (nong) B xia *To]‘ra? haeH 
1° month giéng | chinh, chinh, chanh chiéng jE zhéng | *C.tey tsyeng 
12" month chap lap chap Hes 1a *C rap lap 
1* earthly branch ti (SV) tw and ti NA FF zi *tso? tsiX 
2™ earthly branch xu (SV) suru and xu NA Ft chou *pru? trhjuwX 
3" earthly branch | dan (SV) dan NA sg yin *[e](r)or yin 
4" earthly branch meo (SV) mao and meo NA [J mao *m‘ru? maewX 
5" earthly branch thin than NA be chén *[dJor dzyin 
8" earthly branch mitt vi NA FR wei *m[olt-s mj+jH 
10" year in the cycle dau ddu NA PY you *N-ru? yuwX 


3.5 Measure words 

WOLD does not include a category for measure words. The container word ‘bottle’ has a high 
borrowability rate of 0.60, and it overlaps with the class of measure terms, but this data does not allow 
generalizations. Still, words for containers, and thus related to trade, are in a sociocultural situation that 
can lead to lexical borrowing. The focus here is on general measure words, while classifiers cannot be 
considered in the earliest period of Sinitic-Vietic language contact. Fully grammaticalized classifiers in 
Chinese are not clearly attested until the later first millennium CE (e.g., Peyraube 1996, Behr 2009). 


' J take a different position from that of Norman (1985) and Ferlus (2013), who followed Coedés’s (1935) 
reasonable claim of a Viet-Muong origin of the dozen-animal calendar in Khmer and Tai languages. Norman 
proposed an Austroasiatic origin of the system in Chinese, and Ferlus attempted to provide further 
phonological support. This calendar system is not practiced among hilltribe Austroasiatic groups, not even (to 
my knowledge) conservative Vietic groups, so it appears limited to Viet-Muong. Also, I do not find their 
phonological evidence of proposed loanwords into Chinese to be strong enough to support these claims. Also, 
it is problematic to claim the borrowing of the word for ‘horse’ since horse-raising could not have been part 
of Vietic culture before the Oracle bones, when the Chinese calendar is attested. Regardless of the origins of 
the system in Chinese, the phonological data in Table 16 leans strongly in the direction of borrowing the words 
and overall system from Chinese in the early centuries of the 1“ millennium CE. 
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The category of measure words in Vietnamese shows clear lexical influence of Sinitic from the 
early period. Only several Vietic items have been identified in Vietnamese, as in Table 17, whereas 
Table 18 contains seventeen ECL measure words. The later Sino- Vietnamese layer contains many more 
measure words, though these are not in the scope of the current study, but overall, this highlights the 
long-term tendency for such words being borrowed in Viet-Muong languages. The classifier cdi 
(generic classifier) has also been borrowed in Tai (see Alves 2015a), a kind of indirect evidence of the 
shared period of borrowing of these terms. However, Proto-Tai and later stage Tai vocabulary includes 
a significant number of grammatical ECLs not seen in Vietnamese, highlighting the distinct situations 
of Vietic and Tai with Sinitic. 


Table 17: Proto-Vietic measure words 


Gloss Proto-Vietic Vietnamese | Muong 
bunch (of bananas) *bo:n (AA *buun) buong (tléc) 
bunch/bouquet *po:? NA b6é 
fathom *p-la:s sai NA 
handful of bananas *c-na's (< c-m-a:s ?) NA nai 
handful/contents of two cupped hands *po:k voc poc 
lump #kok cuc coc 
mouthful/piece of *-mecn? miéng/mdnh | miéng 
span *c-kary gang (nang) 


A key question is how Vietic speakers incorporated these measure words into noun phrases in that early 
period. In modern Vietnamese noun phrases, the order is quantity, measure or classifier, and the head 
noun, followed by other modifiers (e.g., mOt qua cam tuoi (one.CLSF.orange.fresh) ‘one fresh orange’). 
I have suggested that borrowing these kinds of words from Chinese impacted Vietnamese noun phrase 
structure as their pre-noun position goes against the typology in the region (Alves 2001). No textual 
evidence for Viet-Muong precedes the second millennium CE, but in the earlier part of the second 
millennium, texts show classifiers as optional and their position in noun phrases variable, both before 
and after nouns (V0 2014). As for the early Vietic noun phrase, I have hypothesized that quantity 
expressions were originally in post-nominal position, this order in Old Khmer texts from the first 
millennium CE and the implied change in progress considering the variable position in early 
Vietnamese Ném texts (Alves 2020). If so, we can speculate that there was stimulus to use such ECLs 
in this post-nominal position at some point after the borrowing of Chinese measure terms but before 
numerals and quantity terms could be moved in front of head nouns. This change in Vietic noun-phrase 
word order may have been in progress for centuries, but no specific time can be offered based on 
existing data. 

However, as noted in Section 3.2, the Vietic numeral system was not impacted by Sinitic. Native 
numerals moved to the front position along with Sinitic general quantity expressions and measure 
words. Perhaps this combined movement of native numeral terms with measure words is related to the 
loss of Annamese Chinese. We can assume that bilingual Annamese Chinese used Chinese numerals 
with Chinese measure words or classifiers in the pre-noun position, but this speech community 
eventually shifted to Viet-Muong. That bilingual situation could have contributed to the variability in 
word order of quantified noun phrases Viet-Muong noun-phrase, even as the Chinese numerals stopped 
being used. Again, there is no concrete evidence of this, making it speculation, but the speculation is 
based on observable data and possible language contact scenarios. 
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Table 18: Early Chinese loanwords for unit terms 


Category Gloss ECL SV Muong | Chinese oc MC 
Architecture | level/floor tang tang thong fs céng | *N-s-t'on dzong 
story, floor, lau lau lau ## lou NONE NONE 
building 
unit for buildings can gian NA fay jian *k‘re[n] kean 
Classifiers | classifier for chiéc | chich chiéc zhi *tek tsyek 
vehicles 
generic classifier cdi ca cai {El ge *ka[r]-s kaH 
measure for tranh tranh NA lei zhén NONE NONE 
pictures 
measure word for thot that NA i pi *phi[t] phyit 


groups of animals, 
classifier for 
elephant, garden, 


raft 
unit for flat things bic phiic NA tf fu *pok pjuwk 

General situation/classifier Ccuoc CUuC cuoc Fay ja *[g](nyok gjowk 
for activities 
type giong | chung, | chong | f& zhong *k.ton? | tsyowngX 

ching 
type/species lodi loai thir 7 ei *[rJu[t]-s IwijH 
part phan | phan phan 43 fén *[m]- pjun 
pe[n]-s 
measure for qué quai NA £h gua | *[k}re-s | kweaH 
divinations 
ten thuoc inlength | dwong | trwong NA “zhang | *[d]ran? drjangX 
(archaic) 
Trade unit of thung | thang NA Ft shéng *s-ton sying 
measurement (for 
cereals) 
peck of dau dau; tau >} dou, *{99? tuwX 
dau dou 

tael lang | lwong lang = liang *TrJan-s ljang 


3.6 Conjunctive words 

The term “conjunctive” is here used loosely as the words in Table 19 include adverbial conjunctive 
words as well as conjunctions. WOLD contains only a few conjunctive words. In the database, the rate 
for ‘because’ is somewhat high at 0.35, while ‘and’ has borrowability rate of 0.19, towards the lower 
end of the range. The sense of ‘with’ is very low at 0.09, but overlap of usage of words ‘with’ and ‘and’ 
is common in languages of Asia, which complicates the situation. 

The number of ECLs in this domain is small, but the items are functionally significant. As shown 
in Table 19, it is precisely these words ‘with’, ‘and’, and ‘because’ that have been borrowed in the early 
period. Details of multiple borrowings of Chinese gong t£ ‘altogether’, a triplet of two ECLs and a 
Sino-Vietnamese morph, have been described elsewhere (Alves 2018b). 

In contrast to several conjunctive ECLs, only one native conjunction, hay ‘or’, Proto-Vietic #hi:, 
is in Vietnamese. Again, additional conjunctive words are seen in the later Sino-Vietnamese layer, but 
the ECLs already show impact on multi-clausal constructions. One speculation to make is that the use 
of parataxis for conjunctive, cause-effect, and conditional (e.g., Ruc in Vietic (Nguyén V. L. 1993:125, 
Pacoh in Katuic (Alves 2015b:892-893)) left gaps to fill with a lexeme. Another possibility is Chinese 
had a somewhat more formal status and involved higher frequency usage of connective words. This is 
the case in modern Southeast Asian languages: parataxis is more associated with minimal lexical 
marking, while more formal registers tend to include lexical clause linkers (Jenny 2021:608-611). 
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Table 19: ECL conjunctions 


Gloss ECL | SV_ | Muong | Chinese OC MC 
and va | hoa | (poi) All hé *T oj hwa 
and, with | cang | céng | cong | t£ gong | *N-k(r)on?-s | gijowngH 
also cling | cong (i) FE gong | *N-k(r)on?-s | gjowngH 
because vi vi (tai) Fy wei *Gv(raj-s hjweH 
if/supposing | gid gia da Ey jia *Co.k‘ra? kaeX 


3.7 Comparative terms 

The data in WOLD includes only two comparative senses: ‘similar’ with a borrowability rate of 0.17 
and ‘more’ at 0.23, both medium-low rates. The former sense is seen in corresponding ECLs in Table 
20, while the latter is manifested as an instance of native grammaticalization and likely a part of a 
regional typological tendency. 

Vietnamese has only one comparative term that stems from an early reconstructable etymon: 
Vietnamese hon ‘more than’, Proto-Vietic #ha:n, a possible grammaticalization of Proto-Austroasiatic 
*hon, *ha:n ‘to grow/to increase’ (see Alves, Jenny, Sidwell 2020:325). This grammaticalization path 
fits into the ‘surpass’ type construction seen among languages in the region, such as Yue Chinese, Tai, 
Lao, and others (see Ansaldo 2010). However, there is currently insufficient data to posit the structure 
of comparative constructions in early Vietic preceding this grammaticalization event. It is used with the 
comparative function in other Vietic languages, but this etymon has not been carefully studied in them, 
nor have I found studies of it in early Nom texts. Lastly, the common Vietnamese intensifier /am ‘very 
much’ is restricted to Viet-Muong languages, so ultimately, there is little reconstructable lexical data in 
this domain. 


Table 20: Early Chinese loanwords for comparison terms 


Gloss ECL SV Muong Chinese OC MC 
as muchas | tay té NA BE gi *[dz]%9j dzej 
contrary | nguoc ngich nguoc wi ni *prak ngjaek 
equal; flat | bang binh pang + ping *m-bren bjaeng 
more SO cang | canh; canh cang  géng, géng | *k‘ran-s & *k‘ran | kaeng 
rather kha kha (khi hoi) Hy ké *[k]alj]? khaX 
similar tua tu NA (Dh si *s9,lo? ZzixX 
to resemble to tu NA (Dh si *s9.19? ZiX 
to compare vi ty NA EL bi *C.pij? pjijX 


As for ECLs, there are several terms that express comparison and similarity in Table 20. Chinese {IJ si 
‘similar’ was borrowed twice in the early pre-Sino- Vietnamese period, resulting in a triplet. Of the two 
ECLs, distinguished only by vowel type, I am not sure which is the older borrowing, only that they have 
vowels corresponding to Old Chinese reconstructions (also note Schuessler’s Late Han Chinese *zia® 
with a diphthong). Vietnamese khd ‘rather’ is from Chinese =] ké ‘able’, but it also has the sense of 
‘very’ as an intensifier for adjectives, though this is less common in Chinese and somewhat more 
archaic. Vietnamese bang ‘equal to’ is a probable grammaticalized form of Chinese ‘level’. As it is also 
in all thirty varieties of Muong as well (Nguyén V.T. 2005:176), it most likely stems to a pre-Viet- 
Muong period, and we can even hypothesize that this could represent a development in Annamese 
Chinese. There is, of course, no means of verifying or excluding this idea, but it would seem more likely 
than grammaticalization of a Chinese morph only in pre-Viet-Muong and not the source variety of 
Chinese. A last thought is that, as suggested in Section 3.6 about conjunctive words, the borrowing of 
comparative words may have filled a lexical gap as some evidence in some data shows parataxis as a 
means of comparative structures, but again, this is not a testable hypothesis. 
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3.8 Modal words 

Modal words are not included in the data of WOLD, so there is no statistical point of reference for 
borrowing of such words. It is, nevertheless, reasonable to assume that loanwords expressing modality 
would most likely be borrowed in a bilingual community or other significant language contact situation. 
As for native etyma, there is no clear evidence of reconstructable modal elements at the Proto-Vietic 
level, whether preverbal words or sentence-final particles. There is the Central Vietnamese negative 
morph no ‘no/not’, for which there is a reconstructed Proto-Vietic *-noh. However, with attestations in 
only a variety of Muong and four Pong lects, it can at best only be connected to a later sub-branch stage 
within Vietic. 

In contrast, there are several probable ECLs expressing types of modality (condition, obligation, 
ability, passive voice, and emphasis), as shown in Table 21. Not all of these are in widespread usage in 
modern Vietnamese, but considering the lack of native modal terms, these loanwords are, or were, 
significant as they serve core modality functions. The ECL meaning ‘to stop’ and the prohibitive 
meaning ‘don’t’ is a probable grammaticalized function in Vietnamese. This meaning extends at least 
back to the 1651 de Rhodes’ dictionary, so it is not a recent development. However, I find no attestations 
of it in the Muong Bi data, and it is not included in Nguyén V. T.’s (2005) comparative Muong data, so 
it might not date to the Proto-Viet-Muong stage. Careful sifting of earlier N6m texts might or might not 
show the use of this word, which is normally a spoken functor. Regarding ndi ‘able’, the semantics are 
slightly different from the Chinese source, suggesting possible grammaticalization in Vietnamese, but 
the form matches the expected form well. Finally, doc ‘get/able’ matches well the function in Chinese, 
and the segments are reasonable, but the tone height is unexpected, but not enough to exclude it as there 
are no viable alternative etymological sources. 


Table 21: ECL modal words 


Gloss ECL SV | Muong Chinese Oc MC 
able noi nai noi iif nai ‘able to *n'9-5 nojH 
endure 
by, due to béi bi poi $i bei *m-p(r)aj?- | bjeH 
s 

don’t (prohibitive) dung dinh | (ché) (= ting *Co.[d]’en | deng 
get/able/(passive) duoc dac (an) (= dé *tigk tok 

must tua tu NA 3A xii *[s]o sju 

(archaic) 
sentence particle thay tai NA BY 2ai *Its|'o tsoj 
(emphatic) 


One problematic item is worth noting. The preverbal bi, from the Chinese béi #% passive marker, is a 
common adversative (1.e., indicates a negative effect) passive marker in Vietnamese, but it is relatively 
recent development, not one seen in early Nom texts. In Table 19, a preposition with passive-like 
features boi ‘by/due to’ has ECL phonological features, though it is in need of checking textual data to 
see how early it can be attested, making it somewhat tentative. 

Lastly, Vietnamese phai has a partially passive-like function, in addition to the senses of ‘correct’ 
and ‘must’. It is even listed with this sense in de Rhodes’ 1651 Vietnamese-Portuguese-Latin dictionary. 
This word has a superficial comparable form to Old Chinese *m-p*(r)aj?-s (MC bjeH, SV bi) of Chinese 
béi #%. However, the passive function in Sinitic was not developed at the stage of Old Chinese, but 
rather has been hypothesized to have developed only in the late first to early second millennium 
(Peyraube 1996:177). Instead, this term likely followed a grammaticalization pattern in Southeast Asia 
involving words meaning ‘correct’, ‘must’, and/or ‘contact’ which developed a passive-like function. 
This is the case with Khmer traw ‘correct/must’ and Thai thuuk ‘correct, conforming; to contact’, both 
of which are also passive markers in those languages (see Matisoff 1991:425-426, but his claim of 
shared origin of Vietnamese diroc and Thai thuik is not supported by historical phonological data). This 
has developed even in the Vietic language Thavung, with a distinct etymon cah meaning ‘correct’ and 
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also marking the passive voice. Thus, comparable evidence suggests it is an instance of chance 
phonological similarity of a distinct etymon. 


4 Summary of the data and the extent and limits of early structural impact 

The situation presented in the previous sections both highlights core native Austroasiatic and Vietic 
elements in Vietnamese grammatical vocabulary and shows clear evidence of early grammatical lexical 
borrowing from Sinitic in the first several centuries of the first millennium CE. The categories of 
grammatical categories of retentions and borrowing can be summarized as follows. 


1. Categories of significant lexical retentions: A solid set of numeral terms (‘1’ to ‘10’), core 
pronouns (1s, 2s, 3s, 2p), question words (‘what’, ‘where’, ‘who’), locational words (several terms), 
and time words (‘day’, ‘month’, ‘year’, etc.) are native retentions. These are precisely the types of 
vocabulary commonly considered in establishing linguistic affiliation and thus underscore the 
Vietic and Austroasiatic origins of Vietnamese, despite the long-term impact of language contact 
with and lexical borrowing from Sinitic and later stages of Chinese languages. 

2. Categories with significant quantities of loanwords: Sinitic has contributed to pre-Viet-Muong 
Vietic a modest number of locational words and a significant number of measure words and 
connective and modal terms. ECLs with temporal functions include several of the twelve-animal 
zodiac calendar, which thus represent a kind of cultural borrowing, but together with other early 
loanwords related to time, the semantic domain of time was notably lexically impacted in this part 
of Vietic from this early period. 

3. Areas of uncertainty: Some issues are uncertain or unanswerable. Chance similarity is still 
possible, and so with few exceptions, absolute certainty cannot be claimed. The issue of 
grammaticalization highlights the challenges in identifying loanwords. Some of the Austroasiatic 
etyma and ECLs have undergone grammaticalization at later periods after speciation of Vietic or 
Viet-Muong, and the precise timing of the grammaticalization of classifiers and passive-voice 
markers in Chinese is still under consideration. Moreover, some of the types of grammaticalization 
are seen in various languages in Southeast Asia. Thus, we cannot always know with complete 
certainty whether words were borrowed as grammaticalized morphs. Also, the timing of some 
developments in Viet-Muong are unclear, some clearly occurring in the second millennium, but in 
other cases, evidence for structural changes in the first millennium of the ECL period is lacking or 
is unclear. 


Overall, despite any uncertainties, the data shows Vietnamese grammatical vocabulary contains a core 
of Vietic etyma (numerals, pronouns, locational terms, etc.) with the borrowing of a significant number 
of grammatical ECLs with the kinds of shared functional elements (measure words, aspectual words, 
comparative words, etc.) that likely facilitated communication in a bilingual community. 

This suggests a question: Can a language borrow grammatical vocabulary but not undergo morpho- 
syntactic or semantico-syntactic change? For pre-Proto-Viet-Muong, there is no direct evidence, such 
as textual data, to answer this question. Only in the early second millennium is there textual data of 
archaic Vietnamese to show possible impact of language contact with Sinitic, and thus any changes had 
to have occurred before that, but with no means of determining timing (i.e., anytime from 1 CE to the 
turn of the second millennium). Regardless, I argue that these grammatical ECLs could have largely fit 
into the existing Vietic morphosyntactic structures and may not have impacted Vietic syntactic 
structures for several centuries approaching the development of the Viet-Muong sub-branch. Also, 
crucially, there are instances in which the Sinitic elements have been fit into the typological structures 
of Viet-Muong. An example is the post-nominal modifiers for locative nouns (Section 3.3). While there 
is no existing textual data for that early period of pre-Proto-Viet-Muong, it seems possible, and even 
probable, that other instances of structural adaption of grammatical ECLs occurred. 

Furthermore, numerous grammatical domains in Vietnamese have seen little to no influence due 
to contact with Sinitic in that early period (e.g., negation terms (unlike Tai, see Pittayaporn et al. 2014), 
core numerals (again, unlike Tai), clause-final particles, etc.). Also, Vietnamese lacks many key 
Chinese typological features (e.g., phrase-final nominalizing particles, post-nominal modifiers, the A- 
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not-A question pattern, clause-initial time adverbs, etc.). Vietnamese also has typological features that 
are unlike those in varieties of Chinese (e.g., post-clausal adverbial elements, intensifiers and negation 
words in both pre- and post-verbal positions, etc.). As both core pronouns and clause-final particles 
have pragmatic functions, their lack of borrowing from Sinitic offers a sense of the limits on the lexical 
borrowing and the nature and intensity of Sinitic-Vietic bilingualism as well as the limits of the ultimate 
linguistic impact of Sinitic on late Vietic, certainly in the pre-Viet-Muong period. Thus, it is reasonable 
to speculate that in the first several centuries of Sinitic-Vietic contact, there was a substantial amount 
of bilingualism, and that the evidence supports a scenario of a socioculturally robust Vietic speech 
community speaking a language with Austroasiatic typological and structural features and native 
grammatical vocabulary for centuries into the period of Chinese settlement in the region. 

To provide additional context for the borrowing of grammatical morphs and questions of related 
syntactic structure, it is also useful to consider the differing language contact circumstances in the pre- 
Viet-Muong period in the first millennium versus the developments in Viet-Muong moving into the 
second millennium. In the first millennium, the pre-Viet-Muong Vietic still would have had a typical 
early Austroasiatic typological template (i.e., non-tonal, polysyllabic, before regional typological 
convergence) before the period of Southeast Asian typological convergence. By the mid-second 
millennium, Viet-Muong had completely lost presyllabic material (though Vietnamese retained cluster 
onsets into the 1800s, and Muong lects still have them), it had fully developed tones, and it had 
undergone substantial typological convergence in the Southeast Asian language area. These features 
are listed in Table 22 (adapted from Alves 2020:54), in which hypothesized differences between Vietic 
and Vietnamese are highlighted. 


Table 22: Linguistic structural features of Vietic versus Vietnamese 


Linguistic Vietic Modern Vietnamese 
Information structure e Topic-comment e Topic-comment 
e Middle voice only, no explicit e Commonly employs lexical marking 

lexical marking of the passive voice of the passive voice 

Clauses e SVO/AVP e SVO/AVP 

Noun-phrase structure e Noun + quantity and modifiers e Quantity + noun + modifiers 

Locational terms e Prepositional structure, but e Prepositional structure, but 
locational nouns with postposed locational nouns with postposed 
modifiers modifiers 

Modality e Unknown e Preverbal modals, modal sentence- 

final particles 
Phonology e Clusters e No clusters (but retained in 


Vietnamese into the 1800s, still 
retained in Muong lects) 


e Sesquisyllables e No presyllables (but textual 
evidence of some in the 1200s) 
e No tones, possible phonation e Complex tone system with 
phonation 
Morphology e Derivational prefixes and infixes e No affixes, only compounding 
e Alternating reduplication e Alternating reduplication 


Connecting lexical borrowing and syntactic structural aspects, we can see a correspondence between 
grammatical ECLs (i.e., measure words and passive markers) and the change in position of quantity 
phrases in noun phrases and the addition of lexical marking of the passive voice. The number and types 
of native Vietic measure words are limited, while a significant number of ECL measure words (but not 
classifiers until later) were borrowed into pre-Proto-Vietic-Muong. No native numbers were replaced, 
with a few early ECL numbers borrowed with highly restricted usage. In the era under consideration, 
Vietic likely had very different noun phrase structure from that of modern languages: strictly post- 
nominal elements and no grammaticalized classifiers, as comparative evidence in Austroasiatic 
indicates (Alves 2020). It is uncertain when or how the restructuring of the Vietic noun phrase occurred, 
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though the pre-nominal position of numbers and classifiers is attested in Vietnamese N6m texts by the 
1300s. 

The borrowing of several ECL conjunctive, comparative, and modal words also suggests the 
possibility that Sinitic had a larger number of such lexemes than in Vietic at that time. Indeed, in a 
review of Austroasiatic grammatical vocabulary, preverbal modal verbs, sentence particles, and 
classifiers are all lacking, whereas there are full sets of pronominal, interrogative, and locative terms 
(Alves, Jenny, and Sidwell et al. 2020). Also, as noted, parataxis—juxtaposed clauses without explicit 
lexical marking for semantico-semantic interpretation—may have left positions to fill with such lexical 
items, but such a supposition will require additional data to clarify. 

Recent borrowing of similar types of grammatical words can be seen among modern Austroasiatic 
languages in the region: So Thavung from Thai (see Srisakorn 2008), Semai from Malay (Alves 
fieldnotes 1998), and Chut lects from Vietnamese (see Nguyén V. L. 1993 for Ruc, Babaev and 
Samarina 2018), and certainly many others. All these minority languages with less sociocultural status 
than the national languages they are in contact with have thus far retained their Austroasiatic typological 
structural profiles. In contrast with these examples, Vietic was not a minority language and was, as 
indicated by archaeohistorical and comparative linguistic studies, a language spoken by a community 
in an emerging proto-urban area (see Alves 2021). Again, native numbers, pronouns, and many locative 
terms were retained in Viet-Muong, so this Vietic speech community had some sociocultural status with 
their Sinitic-speaking neighbors. 


5 Evidence of Annamese Chinese and the development of Viet-Muong 

Twenty years ago, I wrote (Alves 2001:222) that the impact of Chinese on Vietnamese was “primarily 
of lexical influence with some accompanying phonological influence” in light of Thomason and 
Kaufman’s hierarchy of borrowing scale (Thomason and Kaufman 1988:74-75). I also posited that 
various structural typological features of modern Vietnamese reflect regional typological changes, not 
only the impact of language contact with Chinese. Based on the current data, this overall position is still 
supported, though with somewhat more evidence of structural influence of Sinitic. 

I have also suggested in some publications (e.g., Alves 2009) that the language contact in the Viet- 
Muong period leading to borrowing, including grammatical loanwords, was influenced to a good extent 
by literary transmission of words, paralleling the Japanese situation, and I argued this at the time as 
evidence for widespread bilingualism in the first millennium in northern Vietnam was lacking. 
However, since then, Phan (2010, 2013) has presented arguments based on historical and linguistic 
evidence precisely supporting a large Chinese community in northern Vietnam in the first millennium. 
He posits that his coined “Annamese Chinese” developed as a substantial Chinese variety in northern 
Vietnam, but that this speech variety completely shifted to Viet-Muong in the early first millennium. I 
have since published data which further supports such a hypothetical Chinese community (hypothetical 
as there is no concrete historical textual description of it, only general mention of Chinese groups 
periodically migrating), including both lexical borrowing and structural adaptation (Alves 2016, 2017, 
2018a, 2018b, 2020). This study provides further deep support via the early borrowing of grammatical 
morphs, in a period likely before widespread literacy in Chinese among Vietic speakers. 

Indeed, at this point, the linguistic evidence in support of a culturally influential Chinese 
community in northern Vietnam throughout the first millennium continues to grow, making it 
increasingly difficult to account for the ECL data without such a community. This bilingual and 
bicultural scenario would eventually have had an impact on practices of literacy in northern Vietnam in 
the Viet-Muong speech community. We do not have a clear understanding of the development of 
literacy in northern Vietnam in the first millennium. Nevertheless, literacy undoubtedly became more 
prominent over the centuries, as evidenced by the construction of the first university in Vietnam, the 
Van Miéu 3Z/éf ‘Temple of Literature’ in Hanoi in 1070 CE, built a century after Vietnamese political 
independence from China. Assuming a Chinese community was still part of the language ecology of 
northern Vietnam at that point, we can expect a lingering bilingual situation of Viet-Muong and 
Annamese Chinese. But we can also assume that growing literacy-based cultural practices contributed 
in various secondary ways to the lexical borrowing and codification of character readings of Late 
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Middle Chinese, but such possible impact of literacy on the sociolinguistic conditions of lexical 
borrowing occurred after the ECL period. 

Altogether, it was the Viet-Muong speech community that borrowed the largest quantity of Sinitic 
vocabulary, and this community is the most typologically divergent of the Vietic sub-branches. The 
other sub-branches of Vietic borrowed Sinitic content words, and a few grammatical words made their 
way into Vietic broadly. However, the so-called “Sinification” of Viet-Muong generally and 
Vietnamese specifically was far from immediate, and rather took more than several centuries for Viet- 
Muong to undergo structural convergence with Annamese Chinese. And if one considers that (a) tones 
in Viet-Muong may have developed only towards the end of the first millennium (Alves 2019), (b) the 
complete loss of presyllabic material only occurred in the early second millennium (Shimizu 2015, 
Gong 2019), and (c) onset clusters with [r] and [I] lingered into the 1800s (Vu 2019), and they are still 
retained in varieties of Muong (Nguyén V. T. 2005), the concept of “Sinicization”, certainly does not 
account for the entire typological situation in Viet-Muong. Instead, Viet-Muong is the result of very 
long-term language contact with Sinitic but also neighboring languages, leading to shared regional 
exchange and typological tendencies, all on top of an Austroasiatic typological template which also 
likely conditioned the ways in which some linguistic features evolved. 
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Abstract 

In most modern Tai varieties, Proto-Tai voiced stops are devoiced, and Proto-Tai voiced 
stops developed into another class of sounds, but they did not split into multiple 
consonants. For example, Proto-Tai *b either became /p/ or /p*/ in most modern Tai 
languages. However, in Trang Dinh Nung, a variety of the Nung language spoken in the 
Trang Dinh district, Lang Son province, Vietnam, each Proto-Tai voiced stops split into 
two consonants. For example, Proto-Tai *b split into both /p/ and /p"/ in Trang Dinh Nung. 
This paper is aimed at showing that this phenomenon in Trang Dinh Nung is related to 
tone and constitutes evidence that the phonetic feature of the B tone in Proto-Tai had a low 
pitch. 


Keywords: Tai languages, tone, consonant split 
ISO 639-3 codes: cmn, ltc, nut 


1 Introduction 

This paper is aimed at demonstrating that each Proto-Tai voiced stop split into two consonants in Trang 
Dinh Nung (a variety of the Nung language used in Trang Dinh district) and showing that this 
phenomenon is related to tone. In Section 1, I discuss how Proto-Tai voiced stops have changed and are 
realized in modern Tai varieties, and I provide background information about Trang Dinh Nung, which 
is the subject of this paper. In Section 2, I describe the phonological system of Trang Dinh Nung, and I 
discuss etyma with Proto-Tai voiced stops to clarify the consonant split in Trang Dinh Nung. In Section 
3, I discuss the relationship between the consonant split and tone. In Section 4, I discuss the phonetic 
feature of Proto-Tai B tone on the ground of the consonant split in Trang Dinh Nung. 


1.1 Proto-Tai voiced stops in modern Tai varieties 
Modern Tai varieties can be classified into three types from the perspective of how Proto-Tai voiced 
stops change. In modern Tai varieties, Proto-Tai voiced stops (1) became simple voiceless stops, (2) 
became aspirated stops, or (3) were retained as voiced stops. Few modern Tai varieties belong to the 
type 3.' Instead, Proto-Tai voiced stops are devoiced in most modern Tai varieties, such as *b > /p/ or 
*b > /ph/. 

In Trang Dinh Nung, however, the Proto-Tai voiced stops split into two consonants. For example, 
*b has split into /p/ and /p*/. To the best of my knowledge, this split pattern in Trang Dinh Nung is 
uncommon in modern Tai varieties.” 


1.2 Trang Dinh Nung 

The Nung language is spoken by the Nung people, who live mainly in northeast Vietnam. In the 2019 
Vietnam Population and Housing Census (General Statistic Office of Vietnam 2020), the Nung 
population totaled 1,083,298. Trang Dinh district is part of Lang Son province, Vietnam, as shown in 


The Proto-Tai voiced stops “have been devoiced in most modern Tai varieties, except for a few dialects on the 
Sino- Vietnamese border” (Pittayaporn 2009:110). 

Pittayaporn (2009:110) indicates that most modern Tai varieties reflect Proto-Tai voiced stops “either as plain 
/p-/, /t-/, /c-/, and /k-/, or as aspirated /p"-/, /t*-/, /c-/ and /k*-/”. 
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Figure 1. The Nung language belongs to the Central Tai group of Tai languages (Li 1960). The Nung 
ethnic group contains subgroups, and whose names correspond with their original homelands in China. 
According to Phan and Khong (eds.) (1999), there are three subgroups of Nung in Trang Dinh district: 
Nung Chao, Nung An, and Nung Phan Slinh. More than ninety percent of the Nung in Trang Dinh 
district are Nung Chao. According to Fang (1989:163), the ancestors of Nung Chao migrated from 
Longzhou (#£)'), which is located in present Longzhou county.* The Trang Dinh Nung data in this 
paper are from my fieldwork in Trang Dinh district, Lang Son province, Vietnam. The informant is a 
Nung Chao man, who was born and raised in Trang Dinh disctrict. 


Figure 1: Map of Lang Son Province 


Trang Dinh 


BinhGia Van Lang “ang Son City 


Cao Loc 
Bac Son Van Quan N 


Loc Binh 
Chi Lang 


PEE Dinh Lap 


60km 


The syllable structure of Trang Dinh Nung is Ci(C2)V(C3)/T. Table 1 shows the Trang Dinh Nung 
consonants that can occur as C;. Two Trang Dinh Nung consonants, /6/ and /d/, correspond to the Proto- 
Tai implosive *6 and *d, respectively. In other words, Proto-Tai voiced stops are devoiced in Trang 
Dinh Nung, as they are in most modern Tai varieties. 


3 According to Fang (1989: 163), the Nung An people had migrated from Jie’anzhou (#221), and the Nung 
Phan Slinh people had migrated from Wanchengzhou (77%). Jie’anzhou and Wanchengzhou are located in 
present Tiandong (XK) county and Daxin (X#f) county, respectively. 
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Table 1: C; consonants in Trang Dinh Nung 


labial | dental | palatal | velar | glottal 
voiceless stops p t k ? 
aspirated stops p° th kh 
implosives 6 d 
affricate 
nasals m n n y 
voiceless fricatives f S h 
voiced fricatives Vv Z 
lateral fricatives } 
lateral 1 


In Trang Dinh Nung, /w/ and /j/ can occur as C2. Table 2 shows the combination of C; + C2 in Trang 
Dinh Nung. The velar sounds can occur with /w/, and the bilabial sounds can occur with /j/. Regarding 
/2w/, itis found only in onomatopoeia. 


Table 2: Combinations of C; + C2 in Trang Dinh Nung 


kw kbw nw (?w) 
Pj 6) pj mj 


Table 3 shows the consonants that can occur as C3. The C3 consonant /ty/ can occur only after /a/, the 
other C3 consonants can occur after any vowels. 


Table 3: C3 consonants in Trang Dinh Nung 


p t k 
m n n 
w J uy 


Trang Dinh Nung has six tones, which are developed from Proto-Tai tones, as shown in Table 4. 
Although the Proto-Tai D tone has split according to vowel length in many modern Tai varieties, it has 
not split in Trang Dinh Nung. 


Table 4: Tonal split pattern of Trang Dinh Nung (Adapted from Gedney 1972:202) 
A B C DS DL 


Voiceless friction sounds 
h h 
*f-, *"m-, *p"-, ... 


Voiceless unaspirated stops 


mera ie 1(33 4] | 3(354) | 5(2134) | 31354] | 31354] 


Glottal 
*H-, *d-, *2-, ... 


Voiced 


nan eee 21324] | 414) | 6 (322027) 4010 | 41d 
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2 Reflexes in Trang Dinh Nung 

Table 5 shows the Proto-Tai consonant reflexes in Trang Dinh Nung*. One Proto-Tai consonant 
corresponds to one Trang Dinh Nung consonant in principle, but *b, *d, *3, *g, and *y, which are 
highlighted in gray in Table 5, have two reflexes in Trang Dinh Nung. 


Table 5: Consonant reflexes in Trang Dinh Nung 


labial alveolar | palatal velar uvular | glottal 


stops voiceless | *p>p *t>t *c> te *k>k | *q > kb 


voiced | *b>p,p* | *d>t, th | *;>te,s | *g>k, kh] *c>k 


glottalized *6>6 *f>d | *y>z *2>2? 
fricatives voiceless *s >t *6 > te *x >kh | *y>kh | *h>h 
voiced *7>4 *Z>1te | *y>k, kt 
nasals voiceless | **m>m | **n>n | *p>p | *y>y 
voiced | *m>m | *n>n *1>p *y>y 
*hr > h 
i *h. h 
voiceless w>p xh] > | 
liquids and glides 
iced ty > f *r >} 
voice Ww 1S] 


Among the five Proto-Tai consonants that split into two series in Trang Dinh Nung, *y is an exception: 
it is a fricative, not a stop, at least in the Proto-Tai reconstruction. However, the split of *y in Trang 
Dinh Nung also occurs for the same reason as the split in Proto-Tai voiced stops, as shown in Section 
2. 

To show that the ‘1-into-2 split’? phenomenon in Trang Dinh Nung is uncommon in Nung 
varieties, I compare etyma in Western Nung from Gedney’s word list (Hudak 2008) with etyma in 
Trang Dinh Nung. Western Nung (Hudak 2008) has 23 initial consonants, as shown in Table 6. The 
voiceless stops /b/ and /d/ in Western Nung correspond to Proto-Tai *6 and *d, respectively. Thus, it is 
clear that in Western Nung, the Proto-Tai voiced stops developed into voiceless stops, similar to most 
modern Tai varieties. In Western Nung, “the only recorded initial cluster is [kw] in kwaay’ — stubborn” 
(Hudak 2008:36). 


4 The Proto-Tai reconstructions by Pittayaporn (2009) are adopted in this paper. Although there are some 


differences between the Proto-Tai reconstruction by Li (1977) and that by Pittayaporn (2009), the 
differences do not affect the conclusion in this paper. 
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Table 6: Onsets of Western Nung (Data from Hudak 2008) 


labial dental | palatal | velar glottal 
voiceless stops p t Cc k ? 
voiced stops b d 
aspirated p th ch kh 
nasals m n n y 
voiceless fricatives f S § h 
voiced fricatives Vv 6 
lateral 1 
approximants y 


Table 7 shows the codas in Western Nung. Unlike in Trang Dinh Nung, Western Nung has /?/ as a coda. 
Table 7: Codas of Western Nung (Data from Hudak 2008) 


p t k ? 
m n i 
Ww y yy 


Western Nung has six tones, which have developed from Proto-Tai tones, as shown in Table 8. Unlike 
in Trang Dinh Nung, the Proto-Tai D tone is split according to vowel length. 


Table 8: Pattern of the tonal split in Western Nung (Data from Hudak 2008) 


A B C DS DL 


Voiceless friction sounds 
h h 
*f-, *"m-, *p"-, ... 


Voiceless unaspirated stops 1 [144] 2 [21d] | 322242] ) 6 [551] 2 [21 J] 


*p-, *t-, *ke, ... 


Glottal 
*O-, *d-, *2-, ... 


Voiced 
*y-, *m-, *b-, ... 


4 [44 1] 5 [31 \] 6 [55 1] 4 [44 1] 5 [31 \] 


The ‘1-into-2’ splits in Trang Dinh Nung are related to Proto-Tai tone. Tables 9 through 13 show etyma 
with Proto-Tai *b, *d, *j, *g, and *y. In these tables, Proto-Tai voiced stops (and *y) are each split into 
two series according to Proto-Tai tones in Trang Dinh Nung, such as *b > /p, p'/, *d > /t, th/, and *j > 
/te, s/. In contrast, Proto-Tai voiced stops do not split and are simply devoiced in Western Nung. 
Regarding *y, it developed into /h/ in Western Nung. 
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Table 9: Etyma with Proto-Tai *b 


Proto-Tai Trang Dinh Nun Western Nung 
(Pittayaporn 2009) 8 8 (Hudak 2008) 
*be:A ‘raft’ pe? pee* 
*be:’ ‘expensive’ pen? pen* 
*bi:® ‘elder sibling’ pit pii> 

*bu:k? ‘pomelo’ puk* 


Table 10: Etyma with Proto-Tai *d 


Proto-Tai Trang Dinh Nun Western Nung 
(Pittayaporn 2009) 8 8 (Hudak 2008) 
*da:4 ‘to smear’ ta2 
*da: ‘river’ that taa? 
*di? place’ thit tii 
*daw° ‘cane’ taw® 
*da:k? ‘land leech’ tak taak> 


Table 11: Etyma with Proto-Tai *; 


Proto-Tai Trang Dinh Nun Western Nung 
(Pittayaporn 2009) 8 8 (Hudak 2008) 
*yim* ‘to taste’ teim? cim* 
*yanB ‘to weigh’ sant can? 
*teB “to soak’ se* ci 
*ya:n© ‘elephant’ tean® caan? 
*ty:k? ‘rope’ teak* cik? 


Table 12: Etyma with Proto-Tai *g 


Proto-Tai Trang Dinh Nung Western Nung 
(Pittayaporn 2009) (Hudak 2008) 
*ge:n ‘stink bug’ ken? 
*ou:8 ‘pair’ ktu4 kuu? 
*gaw ‘owl’ kaw® 
*gap? ‘narrow’ kap* 
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Table 13: Etyma with Proto-Tai *y 


Proto-Tai Trang Dinh Nun Western Nung 
(Pittayaporn 2009) 8 8 (Hudak 2008) 
*yo:4 ‘neck’ ko? hoo 
*yer® ‘shin’ khent 
*vam® ‘night’ k*§m4 ham4* 
*yo:I° ‘to hammer’ kon® 


Table 14 summarizes the splits of Proto-Tai voiced consonants in Trang Dinh Nung. Proto-Tai voiced 
stops in the syllables which do not have the B tone are realized as simple voiceless consonants, whereas 
Proto-Tai voiced stops in the syllables with the B tone are aspirated in Trang Dinh Nung. For instance, 
*J, it is realized as /te/ in the syllables which do not have the B tone and as /s/ in those with the B tone. 
As for *y, although it is not a stop, but a fricative in Proto-Tai reconstruction, it also splits into /k/ and 
/k*/, similar to the voiced stop *g. We can therefore hypothesize that *y merged into *g during the early 
stage. After that, it split into /k/ and /k*/. 


Table 14: Pattern of Proto-Tai voiced consonant splits in Trang Dinh Nung 


A, C,D B 
*b p p° 
*d t th 
*y te s 
Fg k kh 
*k k kh 


3 Relationship between aspirated sounds and tone height 

The ‘1-into-2’ split in Trang Dinh Nung is very important for developing a better understanding of the 
Proto-Tai B tone. A similar split occurs in modern Beijing Mandarin — the ‘ping song ze bu song [*4 
IKJKAIK]’ (level-aspirated / non-level, unaspirated) phenomenon. Middle Chinese voiced stops in the 
syllables with level tone became devoiced and aspirated, whereas the same initials in the syllables with 
other tone categories are realized as unaspirated in modern Beijing Mandarin. For example, in Middle 
Chinese, + ping ‘level’ and 3 bing ‘sick’ have the voiced stop *d, and + ping ‘level’ has a level tone, 
whereas 4 bing ‘sick’ has a departing tone. In modern Mandarin, *¥ ping ‘level’ is /p*in‘/; that is, *b 
is realized as /p"/. In contrast, 3/3 bing ‘sick’ is /pin\/; that is, *b is realized as /p/, as shown in Table 15. 


Table 15: ‘Ping song ze bu song’ phenomenon in Chinese 
Middle Chinese 


Character (Baxter and Sagart 2014) Beijing Mandarin 
~F ‘level’ *bjaeng / level tone ping /p*in1/ 
ii ‘sick’ *bjaeng / departing tone bing /pin/ 


The ‘ping song ze bu song’ phenomenon might have occurred because of pitch height. Although it is 
impossible to describe the pitch height of each tone in Middle Chinese concretely, some scholars such 
as Pulleyblank (1978:178) have pointed out that, based on historical documents, the level tone in Middle 
Chinese had a low pitch. Chen (2015:100) notes the tendency of voiced stops to cause breathy voice 
and the relationship between breathiness and low tone height. Thus, we can hypothesize that the level 
tone in Middle Chinese preserved the breathiness of voiced initials because it was low. 
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4 Proto-Tai B tone 

Although not all scholars have categorized Proto-Tai as a tonal language, some scholars suggest that in 
Proto-Tai, pitch height was part of a tonal contrast. For example, Pittayaporn (2009:271) argues that 
the pitch height of the B tone was low. In contrast, Liao (2016) argues that the earlier stage of Proto- 
Tai had non-tonal structures (p. 120) and proposes that the B tone in the later stage of Proto-Tai should 
be high-falling due to influence from Middle Chinese as well as the typological collocation in Mainland 
Southeast Asian languages between tone pitch and phonation voice (p. 102). Considering the split of 
voiced stops in Mandarin, which is discussed in Section 3, the ‘1-into-2’ split in Trang Dinh Nung might 
suggest the Proto-Tai B tone was low. 


5 Conclusion 
Proto-Tai voiced stops became simple voiced stops or voiceless aspirated stops in most modern Tai 
varieties. In contrast, Proto-Tai voiced stops split into two consonants depending on the Proto-Tai tones 
in Trang Dinh Nung. In the syllables which do not have the B tone, Proto-Tai voiced stops are realized 
as simple voiceless consonants, whereas Proto-Tai voiced stops in the syllables which have the B tone 
are realized as aspirated in Trang Dinh Nung. 

This ‘1-into-2’ split in Trang Dinh Nung is very important to improving our understanding the 
phonetic features of the Proto-Tai B tone. It may be evidence that the pitch height of the B tone in Proto- 
Tai was low. 
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Abstract 

This article on Tai Meuay first discusses their ethnonym, attested in Vietnam (Nghé An 
and Thanh Hoa) and Laos (Bolikhamxay). After introducing the language data available 
in the literature, it analyzes Tai Meuay data collected in the districts of Khamkeuth, 
Pakkading, and Viengthong of Bolikhamxay Province. It also takes into account speakers’ 
perception of their ethnic group and language. Regarding the Tai Meuay tonal diversity, 
this article identifies a main type with a 123-4 split in the A, B, C and DL columns of 
Gedney’s tone diagram, as against marginal types characterized by no split in DL. 
Although the considered varieties share lexical items with Tai dialects of Nghé An, tonal 
features (e.g., absence of split in DS) and phonological features (e.g., the reflex of the 
initial *y-) indicate that Tai Meuay is affiliated with Tai Daeng within the Southwestern 
Tai branch of the Tai language family. 


Keywords: Tai Meuay, Tai Thaeng, Tai Daeng dialects in Laos and Vietnam, 
ethnonymics, tone systems 
ISO 639-3 codes: tyr 


1 Introduction and literature overview 

This article on Tai Meuay,' both as an ethnonym and as a dialect belonging to the Southwestern Tai 
branch of the Tai language family, focuses on Tai Meuay varieties spoken in a province of Laos, 
Bolikhamxay Province.’ Earlier discussions regarding the ethnonym ‘Tai Meuay’ in the literature relate 
it to a few areas of Vietnam (Dang 2010, Ferlus 2008, Guignard 1912, Robert 1941, Vi 1996) and Laos 
(Chamberlain 1984, Seidenfaden 1967). 

Starting from Thanh Héa Province in Vietnam, the French colonial administrator R. Robert 
(1941:8, 10) explains the context of the appellation “Tay Muoy” in his ethnographic notes about the 
“Tay Deng” (Tai Daeng) of Lang Chanh District. The “Jo” (Tai Yo), who live in Thuong Xuan District 
of Thanh H6éa and Quy Chau District of Nghé An, are the southern neighbors of the Tai Daeng. The Tai 
Yo call the Tai Daeng “Tay Muoy’, and even the Tai Daeng often use that appellation to refer to 
themselves. “Tay Muoy” thus appears to be an exonym of the Tai Daeng and is neither their main 
autonym nor the name of the language they speak. That use of the appellation “Tai Meuay’ to refer to 


For a discussion regarding the term ‘Meuay’ and its use, see section 2. We owe the Romanized form used in 
this article, “Tai Meuay’, to James R. Chamberlain (1984). ‘Tai’ and ‘Tay’ representing [ta:j] and [taj] 
respectively in the Romanized orthography of Vietnamese and in some usages of Romanization followed in 
Laos, the initial component of the phrase ‘Tai Meuay’ should be spelled ‘Tay’. However, Chamberlain uses 
‘Tai’ instead of “Tay’ because the Romanized form ‘Tai’ is the one conventionally used when referring to the 
Tai language family. When Romanized forms for either the ethnonym of the Tai Meuay or their language 
name are referred to in relationship with the authors who use them, they are enclosed in double quotation 
marks. 

For toponyms of Laos, this article gives Romanized forms of their names commonly used in that country (e.g., 
Bolikhamxay, Houaphanh, Khammouane — names of provinces —; Khamkeuth, Pakkading, Viengthong — 
names of districts —, etc.). There are indeed some variations in spelling, and one can meet with forms such as 
Borikhamxay, Houaphan, Khammouan, Khamkeut, instead of the forms given above. As for places located in 
Vietnam, toponyms are given in Vietnamese script. 
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the Tai Daeng is confirmed by the fieldwork conducted by Sam Céng Danh, a graduate student at the 
University of Social Sciences and Humanities (Hanoi, Vietnam), in Thuong Xuan District. 

Some Tai Daeng of Xamtay District (Houaphanh Province, Laos), especially those living in Muong 
Pao Subdistrict, say that they migrated from places they call [muran“* de:n4*]° — the Tai Daeng name 
of Yén Khuong Commune in Lang Chanh — and [muwan“* mo:t?"*] — the Tai Daeng name of Bat Mét 
Commune in Thuong Xuan (Sam Céng Danh, personal communication, March 24, 2021). As a result, 
the fact that Erik Seidenfaden (1967:91) mentions “Thai Mitioi” together with “Red Thai” (Tai Daeng) 
among the Tai groups inhabiting the region of Houaphanh does not come as a surprise. However, 
according to extensive interactions with Tai Daeng speakers in Houaphanh Province, not only in Samtay 
District but also in Viengxay District and Sd6p Bau District, the ethnonym ‘Tai Meuay’ seems not to be 
in use nowadays in Houaphanh Province. 

When it comes to the Western part of Nghé An Province, Théodore Guignard (1912:XXID, a 
missionary priest of the Missions étrangéres de Paris based at Canh Trap (in the present-day Tuong 
Duong District) from 1892 to 1906, mentions “Thay Muoy” as a language name, and he lists “Thay 
Muoy” among a few dialects of the “Thay” language such as “Thay” itself, “Lao”, “Phu Thay’, and 
“Phudn”. Vi Van An (1996:30-32) mentions “80,000 Tai living in three districts: Con Cuong, Tuong 
Duong and Ky Son”. They belong to three main groups: the “Tai Muong”, or people who “belong to 
the Muong’; the “Tai Thanh”, or people whose ethnonym “could possibly have originated from their 
old homeland”, who might thus be linked to either “Thanh Hoa” or “Muong Thanh (Dien Bien Phu)”; 
and the “Tai Muoi”, or people having migrated from “Muong Muoi, which today belongs to the district 
of Thuan Chau, province Son La”. According to Dang Nghiém Van (2010:80, 82), “Muong Mu6i” was 
the capital of the Tai Dam after the 13 century, and the “Tay Muoi” are people who were displaced 
from “Muodng Mud”, following a rebellion against Lé Lgi at the beginning of 15" century. 

The Tai dialects of that region having been documented by Michel Ferlus (2008), one can see that, 
in his classification, there are only two groups of dialects for that region. 

e The first group consists of “Tay Yo” and “Tay Muong”, the latter being also called “Tay Pao” 

in Tuong Duong District. The appellation “Tay Muong” can actually refer to both subdialects. 

e The second group comprises Tai Daeng varieties, also called “Tay Thanh” or “Tay Meuy”. 


If we follow Ferlus’ classification, the use of ‘Tai Meuay’ in Western Nghé An, ‘Tai Meuay’ being 
written either “Tai Muoi” or “Tay Meuy”, is the same as in Robert’s 1941 study: it is an appellation 
among other ethnonyms referring to the Tai Daeng. 

The last area for which the literature provides accounts about the Tai Meuay is the region of Lakxao 
(Khamkeuth District) in Laos,* which used to belong to Khammouane Province, but has been under 
Bolikhamxay Province since 1986.° The speakers whose varieties are dealt with by Chamberlain 
3 In order to note a syllable used in a variety belonging to the Southwestern Tai branch of the Tai family, whether 
it is discussed as a separate word or as a part of a compound or a sentence, this article uses a transcription 
based on the International Phonetic Alphabet. As most syllables which are discussed are expected to have 
cognates in other Tai languages, we have chosen not to note the tone of a given syllable in reference to the 
tone system of the variety in which it is used. Instead, we give the category of that syllable in the tone diagram 
devised by William J. Gedney (1972), which helps summarize Tai dialects’ tone systems. If we take the 
example of the syllable [muan*“], its category, which is given in the superscript indication, is ‘A4’, with ‘A’ 
indicating its etymological tone, as reconstructed for Proto-Tai, and ‘4’ showing the consonant type to which 
its onset belongs. As the category of a given syllable in Gedney’s tone diagram remains the same (there are 
very few exceptions), one can easily find the tone of that syllable in varieties whose tone systems have already 
been documented. Figure | in subsection 3.1 reproduces Gedney’s tone diagram. 

Lakxao, which has developed since 1975, is the present-day center of Khamkheuth District. Nape used to be 
the main center of the region at the time of the French colonization. The toponym ‘Lakxao’ means ‘Kilometer 
20” ([lak?S! sa:w“}), the starting point being the historical administrative building of Nape, locally known as 
the ‘bungalow’. 

Roughly speaking, the present-day Bolikhamxay Province, created in 1986, comprises the territory of the 
former Borikhane Province, as well as areas formerly belonging to Khammouane Province (Khamkeuth 
District), to Vientiane Province, or to Xiengkhouang Province (a few subdistricts of Viengthong District). 
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(1984:68-69) come from two villages of Khamkeuth District. The appellation ‘Tai Meuay’ appears to 
be the only ethnonym of those informants and is the name of their language. 
The Tai Meuay spoken in Khamkeuth District was further investigated by a few researchers. 


A variety spoken in the village of Phénsy (Khamkeuth)°® was documented in 1993 by 
Thongpheth Kingsada and Michel Ferlus, who recorded both “the Matisoff 200-wordlist” and 
“the Gedney tone checklist” for Pranee Kullavanijaya and Theraphan L-Thongkum.’ The two 
linguists of Chulalongkorn University (Thailand) used the language data of that recording in an 
article in which they deal with dialects belonging to the Central and Southwestern branches of 
the Tai family (Kullavanijaya and L-Thongkum 1998). 

Sasithorn Onlao (2010a) investigated a part of Khamkeuth District which she identifies as the 
“Nam Paw basin”, for her master’s thesis which she did at Mahasarakham University 
(Thailand). Her study introduces six Southwestern Tai dialects spoken in the area, including 
“Tai-thaeng”, “Tai-mot”, “Tai-moe’”, “Phoo-thai’”’, “Nho”, and “Laos”. The Tai Meuay variety 
she discusses is a variety spoken in Namphao Village.® The data collected by Onlao are referred 
to in an article in which she focuses on the tone merger and split patterns of the dialects she 
deals with (Onlao 2010b). 

Souksada Soutthixay (2016-2017), a native of the previously mentioned Phénsy (Khamkeuth) 
Village, discussed “570 basic words of Meuy language” in the dissertation she submitted for 
the completion of her bachelor’s degree (Department of Lao Language and Culture, Faculty of 
Letters, National University of Laos). Furthering her studies in folk literature at the master’s 
degree level, she analyzed a corpus of thirty stories (Soutthixay 2019-2020). 


In addition, Tai Meuay living in other areas of the present-day provinces of Bolikhamxay and 
Khammouane are mentioned in the following accounts. 


Chamberlain (1991:103) refers to “Sop Vieng” as “a Tai Moey /méy C1/ village”. Sép Vieng 
is located “near the old LS 28 airstrip at Ban Done’, that means in the present-day Chomthong 
Subdistrict (Viengthong District, Bolikhamxay Province). 

Chamberlain (1996:11) speaks of the “Moey” in Nakai Plateau (Khammouane Province) as one 
of the “Tai speaking groups [which] have recently moved to the plateau from Khamkeut district 
in Borikhamxay”. 

Joachim Schliesinger (2003:175) lists a few villages inhabited by Tai Meuay in the districts of 
Paksane and Pakkading. 


As the previously mentioned studies, this article will discuss data elicited from Tai Meuay speakers 
of Khamkeuth District. Tai Meuay data collected in two other districts of Bolikhamxay Province, 
Pakkading and Viengthong, will also be included in this article. 


2 The ethnonym ‘Meuay’: a linguistic analysis 

Starting with Robert (1941:8), because he explains the context of the ethnonym “Tay Muoy”, his 
account lacks linguistic information regarding the syllable “Muoy” he gives and its meaning. Robert’s 
transcription of the Tai Daeng varieties spoken in “Muong Chéng” ([muan“* ce:n], the Tai Daeng 
name of Lang Chanh) and “Muong Déng” ([mutan** de:n“*]) is rather accurate for lexical items and 


8 


Ironically, Khammouane, the village to which Khammouane Province owes its name, being located in 
Khamkeuth District, is nowadays in Bolikhamxay Province. 

As a matter of coincidence, there are Tai Meuay locations in Khamkeuth District, e.g., Phonsy, Phénxay, 
Nakhua, having the same names as Tai Meuay locations in Pakkading District. As this article mentions Phoénsy 
and Phénxay near Lakxao, as well as Phénsy and Phénxay in Pakkading District, the former Phénsy and 
Ph6énxay are referred to as Phénsy (Khamkeuth) and Phénxay (Khamkeuth) and the latter Phonsy and Phénxay 
as Phénsy (Pakkading) and Phénxay (Pakkading). 

Thongpheth Kingsada and Michel Ferlus’ recording is now an open access resource in the Pangloss Collection 
(https://pangloss.cnrs.fr/). 

Namphao (or “Nam Paw”, as spelled by Sasithon Onlao), in which -phao represents the Lao syllable [p"a:w"“], 
is the name of both the river Namphao in Khamkeuth District and of the village Namphao named after it. 
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their tones in his wordlist and lexicon (Robert 1941:128—139, 141-164), but it is less reliable when it 
comes to ethnonyms. For example, Robert uses the spelling “Jo” to refer to the ethnonym of the Tai Yo 
[taj“* jo:“], but the expected form in his system of transcription is actually “Jo”. As for “Muoy”, the 
absence of tonal mark supports either an A4 syllable or a B123 syllable of the Southwestern Tai tone 
system. The interpretation of this article is that “Muoy” is an A4 syllable of Southwestern Tai, not a 
B123 syllable. When writing “Muoy’”, Robert actually renders in Tai Daeng the Vietnamese designation 
‘Muoi’ of the Tai Meuay since A4 syllables of Southwestern Tai correspond to A2 syllables of 
Vietnamese. 

With respect to the form “Muoy” given by Guignard (1912:XXII), its tone mark consistently 
characterizes C1, C2, and C3 syllables of Southwestern Tai in his Lao-French dictionary and has 
nothing to do with the way it is used in Vietnamese.'° Chamberlain (1984:68) is the first to explicitly 
posit “/muraj C1/”, a C1 syllable in the Southwestern Tai tone system. Ferlus (2008:299) also gives a 
Cl syllable, “[ma:j°']’,!! in which one will notice the reduction of the diphthong. Giving the exact tone 
of the ethnonym ‘Meuay’ in Southwestern Tai remains all the more relevant as it cannot be determined 
from the usage in Vietnamese and in Lao. 

e In Vietnamese, we find, as mentioned above, ‘Muo1’, an A2 syllable in the Vietnamese tone 
system. The diphthong is not simplified. The expected form ‘Mui’, a Vietnamese B1 syllable 
being the equivalent of a Southwestern Tai C1 syllable, is however the correct form, according 
to Vi (1996:38, and personal communication, November 11, 2018) of the Vietnam Museum of 
Ethnology, who is himself a member of a Thai group of Vietnam.” 

e In Lao, one consistently finds ¢D® [my:j“]. Whereas Southwestern Tai C1 syllables are 
pronounced with a rising tone in Tai Meuay, they are pronounced with a low-falling tone in 
Lao. As a result, Lao speakers are likely to interpret a C1 syllable of Tai Meuay such as [mv:j°"] 
as a C4 syllable of Lao. The diphthong is simplified. 


2.1 The diphthong in the ethnonym ‘Meuay’ and its reduction 
As could be seen from the forms mentioned above, the vowel in the term ‘Meuay’ is either the diphthong 
/-u1a-/ or the monophthong /-¥:-/. Three phonological rising diphthongs are found in Southwestern Tai 
languages: /-ia-/, /-wia-/, and /-ua-/. There are a few Southwestern Tai languages in which those 
diphthongs are simplified, such as Tai Don, Phu Thai, and others.!? In addition, Ferlus (2008:310) 
mentions the case of the “Tay Muong” spoken in Tuong Duong District (Nghé An Province) and “Tay 
Maen”, better known in the literature as “Méne”, a Tai dialect of Khamkeuth District described as 
“originally spoken in Nghé An” (Chamberlain 1991): diphthongs in “Tay Muong” and “Tay Maen” are 
simplified in front of semivowels.'* 

When it comes to the Tai Meuay varieties spoken in Bolikhamxay Province, the data we have show 
that there are very few words in which the diphthong is simplified. Then, the diphthong which is 


° Robert’s transcription is clear and consistent for the tones in the A123, B4, C123, C4, and DL4 boxes of the 
tone diagrams used to summarize tone systems of dialects belonging to the Southwestern Tai branch of the 
Tai family. For a more detailed introduction to tone diagrams, see subsection 3.1. 

Guignard (1912:XX) calls his transcription of Lao “quéc ngit laotien”. That Quéc ngit for Lao displays the 
same tone marks as the Quéc ngir for Annamite, but uses them in a different way. 

Ferlus (2008:309) lists the word “[mo:j“']” among a few lexical units of Tai Yo having no cognates in other 
languages of the Tai family. That remark reminds us that, even though it is possible to posit a C1 syllable of 
Southwestern Tai for that word on the basis of its use in a few languages such as Tai Yo and Tai Meuay, it 
cannot be taken into account in discussions regarding the Southwestern Tai branch of the Tai language family 
as a whole or, from another perspective, Proto-Southwestern Tai etyma. 

In Vietnam, ethnographic and linguistic accounts use the form ‘Thai’ to refer to groups speaking Southwestern 
Tai dialects. As for the form ‘Tay’, it is used for groups speaking Central Tai dialects. 

In Phu Thai dialects, the rising diphthongs /-ia-/, /-wia-/, and /-ua-/ are consistently replaced by the long 
monophthongs /-e:-/, /-x:-/, and /-o0:-/. 

According to Frédéric Pain (personal communication, January 25, 2021), diphthongs in the Tai Muong of 
Tuong Duong, also called Tai Pao, can be simplified in front of [-w] and [-j], but such a reduction is not “an 
absolute rule”, and a same speaker can say either [kuaj°'] or [ko:j°] for the lexical item ‘banana’. 
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simplified is always /-wa-/, and its reduction occurs in front of [-j], e.g., [mv:j®] ‘to be tired’, [Pvj°] 
‘elder sister’, as against [muiaj®‘] ‘to be tired’, [Pwaj©] in Lao. Finally, only the ethnonym ‘Meuay’, 
which is used as a separate word or as a final component in a phrase, displays either a diphthong or a 
monophthong. In the ethnonym, /-wia-/ and /-x:-/ somehow lose their phonological status and become 
free variants, hence [mutaj°'] or [mv:j°'] ‘Meuay’; [taj“* munaj“'] or [taj4* my:j°'] ‘Tai Meuay’; [sa:w“! 
muuaj“'] or [sa:w“! my:j°'] ‘Meuay young woman’. It should be noted that, although speakers use freely 
[mutaj°'] or [my:j“'], an overwhelming majority of the twelve speakers from different locations with 
whom we could discuss the term ‘Meuay’ for this research use [muiaj©'] when repeating the word or 
when asked to pronounce it in citation form. 


2.2 Another use of the term ‘Meuay’ 

This article has so far dealt with the term ‘Meuay’, a C1 syllable posited by Chamberlain (1984) and 
Ferlus (2008), as an ethnonym only. In addition, we have accounts which relate that term to places of 
origin of the Tai Meuay, either in Vietnam (Dang 2010, Vi 1996) or in Laos (Schliesinger 2003).!° 
However, neither such accounts nor the discussion of the forms [mutaj°'] and [my:j“'] in the preceding 
subsection have taken into account another use of that term. 

Apart from being used as a separate word or as the final component of a phrase, positions in which 
the term ‘Meuay’ is indeed the ethnonym of the Tai Meuay, it is found in the initial position of a phrase, 
especially when used as the initial component in pronominal compounds. The term ‘Meuay’ then 
denotes plurality, as in [my:j“' ?em“] ‘we (as many people)’, [my:j“! t"aw“*] ‘you (as many people)’, 
[my:j“' sa:“4] ‘they (as many people)’.'® In that position, all the speakers are found to use the form 
[my-:j“'], with the reduction of the diphthong, except one speaker who consistently uses [muraj“']. When 
asked to pronounce the syllable [mv:j°'], used as the initial component of a pronominal compound, in 
citation form, a seven-twelfths majority of the speakers use [muwiaj©']. 

Some of the speakers when discussing the term ‘Meuay’ make the following statement: “the (Tai) 
Meuay are us” — [(taj4*) mutaj©! me:n®4 my:j°! tu:47] ((Tai —) Meuay — be — plural — us). That statement 
as well as the translation of ‘Meuay’ as [pa? sa:“* son“‘] ‘citizen, people’, which was proposed by one 
speaker in another discussion, suggest that Tai Meuay speakers do not separate the ethnonym ‘Meuay’, 
pronounced [mutaj°'] or [myv:j'], from the use of [mv:j©'] to denote plurality. 


2.3 The syllable [muaj“] and its semantic equivalent [puak?] 

This article, which already follows Chamberlain’s 1984 account by using the Romanized form ‘Tai 
Meuay’, will also follow his account by positing the phonological syllable “/muiaj C1/”’ that he gives, 
with the diphthong /-wia-/, for both phonetic syllables [muraj©'] and [mv:j°'], for the three positions they 
can occupy in a sentence (an initial component of a phrase, a final component of a phrase, or a separate 
word), and for both the meanings (an ethnonym and a pluralizer) discussed above. 

The use of [muraj°'] as a pluralizer and the related meaning ‘group, people’ it implies (Ferlus 
2008:299, 309) are worth taking into account in the present discussion, because they do not support 
toponymic interpretations of ‘Meuay’ such as the ones referred to above. Furthermore, [murajC1] in 
that use is the semantic equivalent of [puak?'*] — pronounced [p*uak?'‘] in Thai and Lao —, and 
[puak?*] is used to name some ethnic groups as well. According to Chamberlain (personal 
communication, March 18, 2021), [puak?“] “was used for Kra peoples and for some Austroasiatic 
groups, apparently as a marker of lower social status, as in its use in reference to the Ksing Mul in the 
Tai Dam feudal system”. 

As this article proposes to understand [mutaj°'] in comparison with its semantic equivalent 
[puak?], one will note that, apart from the fact that the term [muraj°'] has the same uses as [puak?“], 


According to Schliesinger (2003:175), the place of origin of the “Tai Meuy” is “Muang Meuy, west of Hua 
Phan province in the most eastern part of Luang Prabang province”. A few speakers interviewed during this 
research stated that they had migrated from Luang Prabang three hundred years ago. However, they could not 
say from which part of Luang Prabang Province they arrived and did not mention a place called “Muang 
Meuy”. 

'6 Ferlus (2008: 307) notes the same use of “[ma:j°']” in Tai Yo. 
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a remark by Robert suggests that, in the relationship between the Tai Yo and the Tai Daeng, the latter 
could be in particular cases the ones with an inferior status. According to Robert (1941:10), the “Jo” 
occupied the best lands in some areas of Thuong Xuan, and their chiefs would bring in “Tay Deng” to 
cultivate higher valleys. The use of the appellation “Tay Muoy” by the Tai Yo when they refer to the 
Tai Daeng, as it is noted by Robert (1941:8), would thus be explained by a local context in which the 
Tai Daeng would be a labor force available for the Tai Yo. 


3 The Tai Meuay language data for Bolikhamxay Province in the literature 

The overview of the literature in the introduction of this article (see section 1) mentions fifteen accounts 
in which the Tai Meuay are dealt with in one way or another. Five of those accounts (Chamberlain 
1984, Kullavanijaya and L-Thongkum 1998, Onlao 2010a, Onlao 2010b, Soutthixay 2016-2017) are 
linguistic accounts dealing specifically with varieties spoken by the Tai Meuay in Khamkeuth District. 
As Onlao’s (2010b) article is nothing more than a summary of her master’s thesis (Onlao 2010a), and 
as we could not consult Soutthixay’s 2016-2017 dissertation, the following presentation builds on the 
other three accounts. 


3.1 Chamberlain’s 1984 account 

It is a short introduction dealing with the Tai Meuay varieties spoken in two villages, Keng Bay and 
Nava, of Khamkeuth District. In an area described as “linguistically very rich” (Chamberlain 1984:62), 
Tai Meuay is “quite widespread” (Chamberlain 1984:68). The present research confirms such a 
qualification, especially if one compares Tai Meuay with other dialects of the same region which are 
spoken in a few villages or even in one village only.'” 

Chamberlain’s account introduces Tai Meuay along with other dialects, a good number of them 
being spoken in Khamkeuth District or in neighboring districts belonging to the province of 
Khammouane as it was before 1986. All the dialects he discusses belong to the two groups he had earlier 
proposed in his classification of the Southwestern Tai branch of the Tai language family, the P group 
and the PH group (Chamberlain 1972, 1975).!* The five dialects of the P group Chamberlain takes into 
account are “Tai Meuay’”, “Tai Khang”, “Tai Kuan’, “Tai Maen’, “Tai Pao” (Chamberlain 1984:66— 
70). Information related to Tai Daeng, a P language which is not spoken in that region, is included “for 
comparison with Tai Meuay” (Chamberlain 1984:69). The dialects of the PH group included in 
Chamberlain’s account are four dialects of the “PH group Neua-Phuan languages”, namely, “Tai 
Nheuang”, “Tai Kaloep”, “Tai Nho”, “Phu Tai” (Chamberlain 1984: 70-76), as well as four dialects of 
the “PH group Lao-Southern Thai languages”, including “Yooy”, “Kaleung”, “Tai Bo”, “Yo” 
(Chamberlain 1984: 76-82). Chamberlain lists a few “phonological characteristics” and “lexical 
characteristics” for each of those dialects. 


'7 Among the languages which are discussed in this article, Tai Meuay is spoken in many locations of Khamkeuth 
District. As for Tai Thaeng (see subsection 4.4), it is less widely spoken. When it comes to languages spoken 
in one village of Khamkeuth District only, a good example is Saek, which is spoken in Nakadék Village. 
Chamberlain (1998) is the first who mentioned Nakad6k and its distinct Saek dialect. Although a few Saek 
families can be found in other villages of that area and in Lakxao City itself, all of them come from Nakadék. 
According to Chamberlain (1975:62), the Southwestern Tai branch of the Tai language family was divided 
around the 8" century into two groups, which subsequently evolved independently from each other. Among 
the features of the two groups’ divergent evolution, Chamberlain focuses on the devoicing of voiced initial 
stops reconstructed for Proto-Tai (*b-, *d-, *j-, *g-) and names both groups according to the reflexes [p-] or 
[p*-] of the Proto-Tai consonant **b-, hence the P group and the PH group. Other specialists of Tai historical 
linguistics disagree with Chamberlain’s dating of the previously mentioned devoicing sound shift and thus 
reject his classification. Gedney (1991:208), for example, states that “the trouble with this classification is that 
it uses as its basic criterion something very late in the history of these languages, but Chamberlain wants to 
make it very early”. 
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e Starting with the phonological characteristics listed by Chamberlain, one will note that the five 
dialects of the P group share, as expected, the reflex [k-] in words reconstructed with the initial 
*g-,!° The reflex [k"-] in words reconstructed with the initial *“y- characterizes Tai Meuay. As for 
the rhyme *-auu, its reflex in Tai Meuay is [¥:]. 


e Among lexical characteristics, one will notice the lexical items “paa (paah)” (‘to go’), “ki” (‘to 
eat’), “Pet” or “Pe?” (‘to do’). 


Tone systems of the varieties dealt with for each dialect are introduced separately. They are summarized 
in tone diagrams adapted from the tone diagram devised by William J. Gedney (1972:434). Figure 1 
shows Gedney’s tone diagram with its five columns, representing the tones reconstructed for Proto-Tai, 
and its four rows, referring to initial consonant types. Devised to display “a maximum of possible tonal 
distinctions resulting from the various types of tonal splits that has been described”, Gedney’s tone 
diagram helps compare Southwestern Tai varieties’ tone systems and identify tone splits and mergers 
which are relevant for dialect differentiation. 


Figure 1: Gedney’s tone diagram for Tai dialects (following Gedney 1972:434). 


Proto-Tai Tones 


1 
Voiceless 
friction sounds 
5) 


Initials at time Voiceless 
of tonal splits unaspirated stops 


2 
Glottal 


4 
Voiced 


Smooth Syllables Checked Syllables 


Chamberlain (1972, 1975) changed the layout of the columns in Gedney’s diagram to A, B, C, DL (D- 
long), and DS (D-short). Chamberlain’s layout is followed in all the tone diagrams which were created 
for this article. 

Chamberlain (1984:67) proposes two tone diagrams for Tai Meuay on the basis of the data he 
elicited from two male speakers, one aged 65 from Nava Village and another one in his seventies from 
Keng Bay Village. In Figure 2, we have adapted his tone diagram for the Keng Bay variety in order to 
display an interpretation using tone numerals”? of the tones in that Tai Meuay variety. 


'9 With regard to the reflex [te-] in words reconstructed with the initial *dz- (Chamberlain 1984:66), Tai Meuay 
shares it with languages of the P group, such as Tai Dam, Tai Don, and Tai Daeng. However, the Tai Maen 
and Tai Pao varieties studied by Chamberlain, although they belong to the P group, are characterized by the 
reflex [s-]. 

In the tone diagrams which were created for this article, tones are represented with tone numerals. This notation 
of tones, devised by Yuen Ren Chao (1930), focuses on the pitch, which it indicates on a five-point scale, with 
1 being the lowest pitch and 5 being the highest. In order to describe a particular tone’s pitch and contour, tone 
numerals appear as sequences of numbers representing the starting point, change points (if any), and the end 
point of the concerned tone’s fundamental frequency (FO) curve. 


20 
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Figure 2: The tone diagram for Keng Bay Tai Meuay with tone numerals (following Chamberlain 
1984:67). 


Initial\Tone A 
1 [43] 
Voiceless friction 
sounds, *s, hm, ph, etc. 


2 


Voiceless unaspirated 
stops, *p, etc. 
3 
Glottal, *?, 2b, etc. 


4 [* 


Voiced, *b, m, 1, z, etc. 


Summarizing tonal features of the Keng Bay variety, we have six tones in the A, B, and C columns and 
three tones in the DL and DS columns. In his introduction to the “P Group Languages”, Chamberlain 
(1984:68) notes a “tone splitting” which “took the form 123-4” in most dialects of that group, and we 
can observe such a 123-4 split in the columns displaying tone splits (A, B, C, DL). In comments below 
his tone diagram for Keng Bay Tai Meuay, Chamberlain further notes the “B-DL coalescence” (B = 
DL, with B123-4 and DL123-4, B123 [554] = DL123 [554] and B4 [224] = DL4 [224]), as well as the 
“lack of splits in DS column” (DS1234 [34]). The three types of shading displayed in our adaptation of 
Chamberlain’s tone diagram reflect the latter two sets of features, which will be summarized as B = DL 
(with B123 = DL123, B4 = DL4) and DS1234. Chamberlain’s diagram for Keng Bay is also 
characterized by a creakiness in the low-falling tone of C4 [221]. One will eventually note that, although 
both B4 [224] and C123 [445] have rising tones, the pitch heights of those tones are different, and there 
is no coalescence. 

Figure 3 shows Chamberlain’s tone diagram for the Nava variety, with an interpretation using tone 
numerals of the tones in that variety. 


Figure 3: The tone diagram for Nava Tai Meuay with tone numerals (following Chamberlain 
1984:67). 


Initial\Tone 
1 


Voiceless friction 
sounds, *s, hm, ph, etc. 
2 
Voiceless unaspirated 
stops. *p, etc. 

3 
Glottal, *?, 2b, etc. 


4 


Voiced, *b, m, 1, z, etc. 


The Nava variety has five tones in the A, B, and C columns and two tones in the DL and DS columns. 
Chamberlain’s comments related to the tone diagram for Keng Bay Tai Meuay apply to the tone diagram 
for the Nava variety as well. There is the same lack of splits in DS column (DS1234 [33]). However, 
when it comes to the B-DL coalescence, that feature is not the same for the two varieties: in the Nava 
variety, neither B nor DL display a split (B1234 [443] = DL1234 [443]). Two types of shading in our 
adaptation of Chamberlain’s tone diagram for the Nava variety reflect those sets of tonal features. 
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As commented by Chamberlain (personal communication, March 18, 2021), “there seemed to be 
two dialects”. The absence of splits characterizing the B and D columns in the tone diagram given for 
Nava Tai Meuay is noteworthy, because it can also be found in tone diagrams available in the literature 
for a “Tai Pao” variety (Chamberlain 1984:67) and a “Méne” variety (Chamberlain 1991:108), both 
spoken in the present-day districts of Khamkeuth and Viengthong of Bolikhamxay Province,”! as well 
as for the “Tay Muong” variety spoken in the Tuong Duong District of Nghé An Province (Ferlus 
2008:310). In addition, tone shapes in the Nava variety tone diagram are rather similar to tone shapes 
in tone diagrams available for those dialects. However, tone diagrams for the Tai Pao and the Méne of 
Bolikhamxay and for “Tay Muong” of Nghé An regularly display a split in the DS column, unlike tone 
diagrams for the Tai Meuay varieties of Keng Bay and Nava. 


3.2 Kullavanijaya and L-Thongkum’s 1998 article 

As previously mentioned (see section 1), the “Tai Moei” variety discussed in these authors’ article is a 
variety spoken in Phénsy Village (Khamkeuth). The speaker, a female aged 39, was recorded on the 6" 
of January 1993 by Thongpheth Kingsada and Michel Ferlus. 

Whereas Chamberlain (1984:68) notes the prevalence of a 123-4 split for each column in languages 
belonging to the P Group, Kullavanijaya and L-Thongkum (1998:284) speak of the 123-4 split for the 
A and C columns in “Tai Dam, Tai Don, Tai Lue, and Tai Daeng (including Tai Phoeng and Tai 
Moei)”.” As a matter of fact, their focus on the A and C columns is relevant when one compares those 
languages of the P Group with languages such as Lao, which has A1-23-4 and C1-234, or Thai, which 
has A1-234 and C123-4. 

In addition, according to the authors, “Tai Daeng and closely related dialects, Tai Phoeng and Tai 
Moei, have a further step of tone merger, i.e., B4 merges with C123”. That B4-C123 coalescence is 
noted for Tai Daeng by Gedney (1989:421, 423, 425) and is always confirmed in interactions with Tai 
Daeng speakers, whether in Houaphanh Province or in Thanh Hoa Province. As for Chamberlain 
(1984:69), he speaks of a “C123-B4/DL4 coalescence” for Tai Daeng, but he does not mention it for 
Tai Meuay, that particular coalescence being absent in the tone diagrams he gives for the Keng Bay and 
Nava varieties. 

The authors do not propose a tone diagram for the “Tai Moei” of Phonsy (Khamkeuth) they refer 
to in their article. However, their article (Kullavanijaya and L-Thongkum 1998:285—286) contains the 
following fourteen lexical items used in that variety whose tones are noted with tone numerals: [by7**] 
‘leaf’, [bin2*?] ‘to fly’, [din*7] ‘soil’, [dan*“*] ‘nose’, [daau*“] ‘star’, [phom*?] ‘hair’, [buron*“7] ‘moon’, 
[sai bui*“3] ‘navel’ for the A123 box; [baan**] ‘village’, [phum*?] ‘bee’ for the C123 box; [bok**] 
‘flower’, [ka duk**] ‘bone’, [pik**] ‘wing’ for the DL123 box; [naak”*] ‘otter’ for the DL4 box.” As the 
lexical items given for the DL123 and the DL4 boxes support a split in the DL column, the tone merger 
and split pattern in Phoénsy (Khamkeuth) could be of the Keng Bay type rather than of the Nava type. 


3.3 A first summary of the criteria provided by the literature 

Chamberlain’s 1984 account and Kullavanijaya and L-Thongkum’s 1998 article provide criteria that 
help identify Tai Meuay varieties, and a linguistic analysis will thus check the following characteristics 
in each variety. 


21 While the Méne and Tai Pao varieties can be said to be spoken in Khamkeuth and Viengthong, Méne is 
definitely more widespread than Tai Pao. 

“Tai Phoeng, a branch of Tai Daeng, can be found in Muong Kham, Xiangkhuang province” (Kullavanijaya 
and L-Thongkum 1998: 285). 

The main purpose of the authors when they list such lexical items is to show that, although the dialects they 
identify as “Tai Phoeng” and “Tai Moei” appear to be “closely related” to Tai Daeng, “the different patterns 
of consonant changes can be used as criteria for separating Tai Phoeng and Tai Moei from Proper Tai Daeng 
and from each other” (Kullavanijaya and L-Thongkum 1998: 285). For example, “Tai Moei” lexical items 
such as [by*7] ‘leaf, [buron*“?] ‘moon’, [din??] ‘soil’, [phom“7] ‘hair’, etc., whose respective initials are 
reconstructed as *?b-, *?bl-, *?d-, and *ph- in Proto-Southwestern Tai, do not display the reflexes [v-] (for 
both *?b- and *?bl-), [I-] (for *?d-), and [f-] (for *ph-), which characterize many varieties of Tai Daeng. 
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e Phonological characteristics (e.g., the reflex of the initial *y- and the reflex of the rhyme *-auz); 
e Lexical characteristics (e.g., Tai Meuay’s specific lexical items for ‘to go’, ‘to eat’, or ‘to do’); 


e Tonal characteristics (e.g., 123-4 split in columns displaying tone splits, especially in the A and 
C columns; B-DL coalescence; lack of splits in DS column; B4-C123 coalescence). 
For two of the tonal characteristics which have just been mentioned (B-DL coalescence, B4-C123 
coalescence), literature provides a counterexample with Onlao’s 2010 study. 


3.4 Onlao’s 2010 dissertation 

The Tai Meuay data in Onlao’s study were elicited from a female speaker, aged 65, living in Namphao, 
a village on the outskirts of Lakxao City, where a bridge on the road from Lakxao to Nakai crosses the 
Namphao River. According to the information gathered in the area during this research, Namphao 
cannot be considered as a Tai Meuay location. However, as many Tai Meuay families have come to 
live in Lakxao City and the villages which belong to it (e.g., Nongpong and Sémsanouk), a few Tai 
Meuay speakers can be found in Namphao. 

The description of the tones which the author proposes for the Namphao variety is based on 
measurements of their fundamental frequency (FO) in the Hertz (Hz) scale, which she summarizes by 
giving a figure displaying the FO curves of the tones in the appendix of her dissertation (Onlao 
2010a:189). Figure 4 shows a tone diagram for Namphao, with an interpretation, using tone numerals, 
of the tones in that variety. 


Figure 4: The tone diagram for Namphao Tai Meuay with tone numerals based on the FO 
measurements of the tones, as given by Onlao (2010a:189). 


Initial\Tone A B Cc DL DS 
il CP) i] hal pe c | 


Voiceless friction 
sounds, *s. hm, ph, etc. 
2 
Voiceless unaspirated 
stops, *p, etc. 

3 
Glottal, *?, 2b, etc. 


4 [***] [ | 


Voiced, *b, m. 1, z, etc. 


The Namphao variety has five tones in the A, B, and C columns and two tones in the DL and DS 
columns. While a 123-4 split can be observed in A, B, and C, there are no splits in either DL or DS 
(DL1234 [324], DS1234 [35]). As we have a split in B (B123-4) and no split in DL (DL1234), there is 
no B-DL coalescence in this tone diagram (B # DL). In addition, the tone shape in DL1234 [324] cannot 
be related to either the tone shape in B123 [442] or the tone shape in B4 [112]. When it comes to the 
tones in the B4 and C123 boxes, although the tones in B4 [112] and C123 [445] are rising tones, their 
tone numerals indicate quite different pitch heights, and there is no coalescence. A distinctive feature 
of this tone diagram, according to Onlao (2010a:86), is the B4-C4 coalescence (B4 [112] = C4 [112]). 
Could that B4-C4 coalescence noted for Namphao Tai Meuay possibly be used as a criterion when 
dealing with other varieties? In the tone diagram given for the Namphao variety, the sets of tonal 
features given as B4 = C4 and DS1234 are reflected by two types of shading. 


4 The investigation and the Tai Meuay language data of this research 

From October 2010 to May 2012, the author of this article was based in Kaysone Phomvihane City 
(Savannakhet Province, Laos) and had the opportunity to work on students’ linguistic autobiographies 
together with a Lao instructor teaching French at Savannakhet University. Although the final article 
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dealing with that study focuses on Phu Thai students of the provinces of Khammouane and Savannakhet 
(Pacquement and Phongphanith 2012), four Tai Meuay students belonging to Bolikhamxay Province — 
one from Khamkeuth District, two from Viengthong District, and one from Pakkading District — 
participated at an earlier stage of the research and provided data. The student from Khamkeuth District 
provided a great deal of additional information about her village, Vangko, other villages in the 
surroundings, and the Tai Meuay language spoken there during in-depth interviews conducted in May 
2012 and further exchanges which took place later in December 2015. 

The author of this article eventually visited the region of Lakxao in July 2016 and could go and 
stay there again in November 2016, November 2017, and March 2018. He also visited a few locations 
in Viengthong District (November 2017) and Pakkading District (January 2020). Whereas in 
Khamkeuth District, as well as in some parts of Viengthong District, Tai Meuay is, as noted by 
Chamberlain (1984:68), “quite widespread”, in Pakkading District, Tai Meuay is spoken in a few 
villages only. 

This section discusses the data collected for this research. That data collection consisted of informal 
interviews, in which various topics, such as body parts, colors, kinship, nature, weather, as well as food 
gathering and cooking processes, were discussed. One purpose of those exchanges, which were 
recorded, was to elicit representative monosyllabic words for each of the twenty boxes in Gedney’s tone 
diagram. Whenever such words appeared during the interviews, speakers were sooner or later invited 
to pronounce them in citation style at least two times. 

The data analysis showed both a language uniformity with respect to phonological and lexical 
features and a language diversity with respect to tonal features. The focus in this section is on tonal 
characteristics. The PRAAT program (Boersma and Weenink 2021) was used to obtain measurements 
of fundamental frequency for each tone in the monosyllabic words elicited from speakers. Auditory 
judgements by some speakers, those whom we could meet again to discuss the data after the tone 
analysis, were also taken into account. 

Map | shows both the locations whose varieties are dealt with in the literature (see section 3), i.e., 
Keng Bay (1), Nava (2), Phénsy (Khamkeuth) (3), Namphao (4), and the locations for whose varieties 
this article proposes tone diagrams, 1.e., Phayat (5), Chomthong (6), Phénsy (Pakkading) (7), Thongkhe 
(8), Nong-o (9), Hinngén (10). 


Map 1: From google maps (https://www.google.com/maps) @2021 Google. 
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4.1 Phayat (Khamkeuth) 

The interview in Phayat Village, which took place in July 2016, was with a female speaker in her forties. 
The tone diagram for the variety she speaks is given in Figure 5. That speaker was interviewed again in 
March 2018. 


Figure 5: The tone diagram for Phayat Tai Meuay. 


Initial\Tone A 
l [33] 
Voiceless friction 
sounds, *s, hm, ph, etc. 


2 


Voiceless unaspirated 
stops, *p, etc. 


3 
Glottal, *?, 2b, etc. 


4 [* 


Voiced, *b, m, 1, z, etc. 


The Phayat variety has five tones in the A, B, and C columns and three tones in the DL and DS columns. 
The 123-4 split is found in the columns displaying tone splits. The DS column, which has no split, has 
a rising tone (DS1234 [25]). On the basis of the measurements of the fundamental frequency obtained 
for the tones in the B123 and DL123 boxes, and in the B4 and DL4 boxes, there is a B-DL quasi- 
coalescence: B ~ DL, with B123 [342] ~ DL123 [343], and B4 [125] ~ DL4 [124]. In addition, B4 
merges with C4 coalescence (B4 [125] = C4 [125]). The tones in B4 and C123 are both rising tones (B4 
[125] and C123 [245]), but their contours are different, and there is no coalescence. The last two tonal 
features (B4 = C4 and B4 # C123) were confirmed by the speaker’s auditory judgement. Various types 
of shading in the tone diagram for Phayat reflect the following sets of tonal features: B ~ DL (with B123 
~ DL123, B4 = DL4), B4 = C4, and DS1234. 


4.2 Chomthong (Viengthong) 

The interview in Chomthong, nowadays the headquarters of the subdistrict bearing the same name in 
Viengthong District, took place in November 2017 with a female speaker in her forties. Figure 6 shows 
the tone diagram for the variety she speaks. 


Figure 6: The tone diagram for Chomthong Tai Meuay. 


Initial\Tone A 
1 33] 
Voiceless friction 
sounds, *s, hm, ph, etc. 
2 
Voiceless unaspirated 
stops, *p, etc. 
3 
Glottal, *2, 2b, etc. 


DS 
[*] 


4 Fl 


Voiced, *b, m, 1, z, etc. 


The tone numbers in the B4 and C4 boxes (B4 [334] ~ C4 [335]) and the FO measurements represented 
in [334] and [335] being close to each other, the 6 tones in the A, B, and C columns turn out to be five; 
and there are three tones in the DL and DS columns. Then, the tone splits and mergers in the tone 
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diagram for Chomthong can be related to those displayed in the tone diagram for Phayat. Tone shapes, 
however, are different. For example, the tone in the DS column is a low-level one (DS1234 [22]). When 
considering the B-DL coalescence, we have B123 [541] = DL123 [541], and B4 [334] = DL4 [334] for 
Chomthong, as opposed to B123 [342] = DL123 [343], and B4 [125] ~ DL4 [124] for Phayat. As for 
the tones in B4 and C123, they have different contours (B4 [334] # C123 [55]) in Chomthong, and there 
is no coalescence. The various types of shading displayed in the tone diagram for Chomthong reflect 
the sets of tonal features summarized as B = DL (B123 = DL123, B4 = DL4), B4 ~ C4, DS1234. 


4.3 Phonsy (Pakkading) 

With respect to Phénsy (Pakkading), this article takes into account the varieties spoken by two speakers. 
The first speaker who provided data is a former student of the National University of Laos in her 
twenties. She left her native place, Phénsy (Pakkading), at the age of 10, and then lived in another part 
of Bolikhamxay Province (Thaphabat District), where Lao varieties are spoken, but her family 
continued to speak the Tai Meuay of Phénsy (Pakkading). She was interviewed in August 2018 at 
Nongkhai (Thailand), where some of her relatives run a restaurant. The tone diagram for the variety she 
speaks is given in Figure 7. 


Figure 7: The first tone diagram for Phénsy (Pakkading) Tai Meuay. 


DL DS 
i] | 


Initial\Tone A 
l [5224] 
Voiceless friction 
sounds, *s, hm, ph, etc. 


Ps 


Voiceless unaspirated 
stops, *p, etc. 
3 
Glottal, *2, 2b, etc. 


4 [*] 


Voiced, *b, m, 1. z, etc. 


Pp] 


This first tone diagram for Phénsy (Pakkading) has five tones in the A, B, and C columns and three 
tones in the DL and DS columns. Although the concerned speaker has lived in Lao-speaking 
environments since the age of 10, her tone diagram displays a 123-4 split in the A, B, C, and DL 
columns, the columns which have tone splits. Lao being characterized by quite different splits in the A 
and C columns (A1-23-4, C1-234), the language she spoke during the interview was definitely not Lao. 
As the tone diagram for her variety can be related to the tone diagrams for the Tai Meuay varieties of 
Phayat and Chomthong discussed above, it can be inferred that she still speaks Tai Meuay. The tone in 
the DS column is a mid-level one (DS1234 [33]). When it comes to the B and DL columns, which both 
display a 123-4 split (B123-4 and DL123-4), measurements of the fundamental frequency for the tones 
in the B and DL columns support a B-DL quasi-coalescence (B ~ DL): the tones in the B123 and DL123 
boxes are both falling tones, albeit with different tone numbers (B123 [553], DL123 [441]); as for the 
tones in the B4 and DL4 boxes, they can be more easily related to each other (B4 [335] ~ DL4 [325]). 
As in Phayat, B4 merges with C4 (B4 [335] = C4 [335]). One will further note that, in this first tone 
diagram for Phénsy (Pakkading), the tone in the B4 box, a mid-rising tone, can be compared with the 
tone in the C123 box (B4 [335], C123 [445]). However, we could not meet that speaker again to 
determine whether there could be a merger. In order to clearly show the relationship of this first tone 
diagram for Phénsy (Pakkading) to the tone diagrams for Phayat and Chomthong, the types of shading 
it displays reflect only the following three sets of tonal features: B ~ DL (B123 ~ DL123, B4 ~ DL4), 
B4 = C4, and DS1234. As for the rather complex tone in the A123 box (A123 [5214]), which is 
characterized by a voice quality feature, pressed voice, in the third quarter of its duration, it actually 
brings us to the second tone diagram for Phénsy (Pakkading). 
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The second speaker who provided data for the variety spoken in Phénsy (Pakkading) variety is a female 
speaker in her fifties whose family seems not to be related to that of the first speaker. The interview 
took place in January 2020 in the village of Phénsy (Pakkading) itself. Figure 8 shows the tone diagram 
for the variety she speaks. 


Figure 8: The second tone diagram for Phénsy (Pakkading) Tai Meuay. 


Initial\Tone A 
1 [5413] 
Voiceless friction 
sounds, *s, hm, ph, etc. 
2 
Voiceless unaspirated 
stops, *p, etc. 
3 
Glottal, *?, 2b, etc. 


C DL DS 
[5] [5] PP 


4 Py 7 | 


Voiced, *b, m, 1, z, etc. 


This second tone diagram for Phonsy (Pakkading) has six tones in the A, B, and C columns and three 
tones in the DL and DS columns. The tone in the DS column is a low-falling and rising tone (DS1234 
[212]). There is a B-DL quasi-coalescence (B ~ DL), both columns displaying the same split (B123-4 
and DL123-4), and, whether one considers the tones in the B123 and DL123 boxes or those in the B4 
and DL4 boxes, the two tones in each pair can be related to each other (B123 [54] = DL123 [53], B4 
[214] ~ DL4 [213]). This tone diagram lacks the B4-C4 coalescence (B4 [214] # C4 [112]). 
Furthermore, unlike the previous tone diagram for the same location, the two rising tones in the B4 and 
C123 boxes have different contours (B4 [214] # C123 [35]), and there seems to be no coalescence. The 
tone mergers and splits in this second tone diagram for Phonsy (Pakkading) thus appear to be of the 
same type as the ones in Chamberlain’s tone diagram for Keng Bay Tai Meuay. Accordingly, the 
various types of shading it displays reflect the sets of tonal features given as B ~ DL (with B123 = 
DL123, B4 = DL4) and DS1234. When it comes to the tone in the A123 box (A123 [5413]), this second 
speaker is found to pronounce it with the same voice quality feature as the previous speaker. That voice 
quality feature, pressed voice, occurring in both cases in the third quarter of the tone duration, appears 
to be a distinctive characteristic of the variety spoken in Phénsy (Pakkading). 


4.4 Théngkhe 

A more intriguing tone diagram is the one for Théngkhe Tai Meuay in Figure 9. The interview, which 
took place in July 2016, was with a male speaker in his fifties. That speaker was interviewed again in 
November 2017 and March 2018. 

The Thongkhe variety has nine tones in the A, B, and C columns and three tones in the DL and DS 
columns. As expected, there is only one tone in each of the A4, B123, B4, C123, DL123, DL4, and 
DS1234 boxes. However, the A123 box contains three tones, and the C4 box two tones. The three tones 
in the A123 box can be found in Al, A2, and A3 words, as well as with all the possible initial consonants 
of Al, A2, and A3 words. Each of the two tones in the C4 box can be found with the initials associated 
with C4 words. 
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Figure 9: The tone diagram for Théngkhe Tai Meuay. 


Initial\Tone A DS 
1 [>] 4] 
Voiceless friction lias | 
sounds, *s, hm, ph, etc. fi 3] 
2 


Voiceless unaspirated 
stops, *p, etc. 
3 
Glottal, *?. 2b. etc. 


4 [233] [235] 
Voiced, *b, m, 1, z, etc. 


With respect to the tone mergers and splits, the tone diagram for Thoéngkhe displays the same pattern 
as the tone diagram proposed by Chamberlain for Keng Bay Tai Meuay. The DS column having no 
splits, its tone is a rising one (DS1234 [34]), as in the Keng Bay and Phayat varieties. There is a B-DL 
quasi-coalescence (B ~ DL), for which we have a B123-DL123 coalescence (B123 [343] = DL123 
[343]), and a B4-C4 quasi-coalescence (B4 [225] = DL4 [235]). While the tone numbers in the B4 and 
C123 boxes (B4 [225], C123 [35]) could have supported a kind of coalescence, the speaker’s auditory 
judgement did not support it. When he was invited to listen to B4 and C123 words, such as [ka:®*] ‘cost, 
fee’ and [ka:“] ‘young plant’, [pa:®“] ‘paternal grandmother’ and [pa:°'] ‘grass’, [my:j®] ‘to be tired’ 
and [my:j“'] ‘Meuay’,™ which he himself had pronounced several times in citation form in previous 
interviews, he always identified them correctly. As for the tones in the B4 and C4 boxes, they have 
quite different contours. The various types of shading displayed in the tone diagram for Théngkhe Tai 
Meuay reflect the tonal features given as B = DL (with B123 = DL123, B4 ~ DL4) and DS1234. 

In order to explain the three tones ((51], [423], and [113]) in the A123 box,” we propose to start 
from the tone identified by Kullavanijaya and L-Thongkum (1998:285—286) for the same box (A123 
[243]). Consisting of a rising part and a falling part, that tone’s contour is frequent in many Tai Daeng 
varieties for words in the A123 box. 


e In the speech of the speaker interviewed in Théngkhe, the rising part being either very short or 
absent, the falling tone, high falling in his speech [51], makes sense. 


e In what might be a secondary development, the second half of that falling tone can become 
slightly rising, hence the second tone shape [423], also found in the A123 box of Chamberlain’s 
tone diagram for Keng Bay Tai Meuay. 


e Although a tone in the A123 box consisting only of the rising part of the tone shape [243] could 
not be found in the Tai Meuay varieties which were investigated, there is such a rising tone in the 
A123 box of the tone diagram we give for a Tai Thaeng variety that could be investigated during 
this research (see Figure 10). However, the rising tone in the A123 box of the variety spoken by 
the speaker of Théngkhe (A123 [113]) is much lower than the one we have in the Tai Thaeng 
variety (A123 [245]). 


The case of the C4 box is a case for which one cannot focus on FO measurements only. The two tone 
shapes we find in the C4 box ([41?] and [425]) can be found in some Tai Daeng dialects as well. Many 


4 As the concerned speaker pronounces the ethnonym ‘Meuay’ either [mutaj“'] or [mv:j°!] — whether he uses 


that word in natural speech or pronounces it in citation form —, the parts of the recordings he was invited to 
listen to in order to identify [mzy:j®*] ‘to be tired’ and [mv:j°'] ‘Meuay’ were only the parts in which he was 
pronouncing [my:j!]. 

These three tones in the A123 box are not specific to the speech of that speaker only and actually concern 
other speakers of all generations in that particular area, including the female student from Vangko mentioned 
at the beginning of section 4. 


25 
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Tai Daeng varieties spoken in Houaphanh and Thanh Héa have a falling tone in the C4 box, and that 
tone can be found to be slightly glottalized. For Tai Meuay, Chamberlain gives a low-falling tone in the 
C4 box of his tone diagram for the Keng Bay type (C4 [221]), its second half being characterized by a 
creakiness. In the variety spoken by the speaker of Théngkhe, we have a high falling tone with a slight 
glottal constriction [41?]. When the glottal constriction is absent, the second half of the tone becomes 
rising, hence the contour in the falling-rising tone [425]. 

Focusing on the two falling tones in the C4 boxes of the tone diagrams for the Keng Bay (C4 [221]) 
and Théngkhe varieties (C4 [41?]), one will note that such a tone shape in C4, which cannot be found 
in the tone diagrams for other Tai Meuay varieties, appears to be specific to the Tai Meuay varieties of 
Thongkhe and Keng Bay. However, we find a similar contour for the tone in the C4 box, with the same 
tone numbers as those in our interpretation for Keng Bay Tai Meuay (see Figure 2), in the Tai Thaeng 
variety investigated in Nong-o Village,”° whose tone diagram is given in Figure 10 for comparison. The 
Tai Thaeng data on which it is based were elicited from a male speaker in his forties during an interview 
which took place in November 2017. 


Figure 10: The tone diagram for Nong-o Tai Thaeng. 


Initial\Tone A 
1 [745] 
Voiceless friction 
sounds, *s, hm, ph, etc. 
2 
Voiceless unaspirated 
stops, *p, etc. 
3 
Glottal, *?, 2b, etc. 


4 


Voiced, *b, m, 1, z, etc. 


This tone diagram will not be discussed in detail. We will simply note that, with a B-DL quasi- 
coalescence and the lack of splits in the DS column, the tone mergers and splits in this particular Tai 
Thaeng variety have the same pattern as those in the Tai Meuay varieties spoken in Keng Bay and 
Thongkhe. However, such a proximity can only highlight the main difference between the former (Tai 
Thaeng) and the latter two (Tai Meuay) varieties: the tone diagram available to us for Thai Thaeng is 
precisely characterized by a B4-C123-DL4 coalescence, supported by tone numbers and the FO 
measurements they represent. When it comes to Tai Meuay varieties, the use of the PRAAT program 
and speakers’ auditory judgements could not support an indisputable B4-C123 merger in any Tai Meuay 
variety. In the tone diagram for Nong-o Tai Thaeng, the sets of tonal features given as B ~ DL (B123 ~ 
DL123, B4 = DL4), B4 = C123 = DL4, and DS1234 are reflected by three types of shading. 


4.5 Hinngén 

The last tone diagram discussed in this article is the one for Hinng6n, a village belonging to Naxouang 
Subdistrict in Viengthong District.”’ The interview in Hinng6n took place in November 2017 with a 
female speaker in her forties. Figure 11 shows the tone diagram for the variety she speaks. 


°° The ethnonym and language name ‘Thaeng’ ([the:n“!]) is spelled ‘Thanh’ in accounts dealing with ethnic 


groups and Tai dialects in Vietnam (see section 1). When it comes to Khamkeuth District, Tai Thaeng, a dialect 
of the P group, is mainly spoken in a few villages around Nape and Nong-o. Outside that area, Tai Thaeng is 
also spoken near Théngkhe in Nathone Village. 


27 From Khamkeuth District to Xiengkhouang Province, Naxouang is the first subdistrict of Viengthong District. 
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Figure 11: The tone diagram for Hinng6én Tai Meuay. 


Initial\Tone A B C DL DS 
l [225] [4] [344] [34] 

Voiceless friction 
sounds, *s, hm, ph, etc. 


2 


Voiceless unaspirated 
stops, *p, etc. 


3 
Glottal, *?, 2b, etc. 


4 "] i 


Voiced, *b, m, 1, z, etc. 


The Hinng6n variety has five tones in the A, B, and C columns and two tones in the DL and DS columns. 
The main difficulty with this tone diagram lies in the B and DL columns. On the basis of the FO 
measurements for the tones in those two columns, we obtain a first tone for B1, B2, B3, a second one 
for B4, and a third one for DL1, DL2, DL3, DL4. Their tone numbers being [41], [42], [442] 
respectively, could there be some kind of coalescence? 

As the author of this article could not visit Hinng6n again to meet the concerned speaker and further 
assess the relationship between the three tones in the B123, B4, and DL1234 boxes, we must first take 
a close look at the FO measurements which the tone numbers given in the tone diagram represent. 


e The tone numbers [41] for the B123 box represent a tone whose starting point is 265Hz.”* During 
its first quarter, the FO curve rises to reach a change point (284Hz). Then, from the second quarter 
to the last quarter, it gradually falls to 197Hz. 


e The tone numbers [42] for the B4 box represent a tone with a similar contour. The FO curve first 
rises from 266Hz to 277Hz, the latter value of the FO being reached in the middle of the second 
quarter. Then it falls to a comparatively higher end point (234Hz). 


e The tone numbers [442] in the DL1234 box represent a tone whose FO curve rises from 276Hz to 
294Hz during its first half. Its change point is in the exact middle of the FO curve, and it then 
gradually falls during its second half to 249Hz. 


Taking into account the significant difference between the falling part of the tone in the B123 box (with 
a FO falling from 284Hz to 197Hz) and that of the tone in B4 (with a FO falling from 277Hz to 234Hz), 
one can reasonably conclude that there is no coalescence between these two tones. In addition, with a 
split in B (B123-4) and no split in DL (DL1234), there is definitely no B-DL coalescence (B # DL). 
One will also note the following tonal features: B4-C4 coalescence (B4 [42] = C4 [42]), no B4-C123 
coalescence (B4 [42] # C123 [344]), and lack of splits in DS column (DS1234 [34]). On the basis of 
the tone mergers and splits, the tone diagram for Hinngén can thus be compared with the one which 
was created for the Namphao variety studied by Onlao (2010a). 
However, when one takes into account the tone contours in the B123, B4, DL1234 boxes of both 
tone diagrams, such a comparison based only on tone mergers and splits seems to have its limits. 
e In Sasithorn Onlao’s study, the three tones in the B123, B4, and DL1234 boxes have completely 
different contours (B123 [442], B4 [112], DL1234 [324]) and cannot be related to one another. 
e With respect to the tone diagram for Hinngén, although FO measurements do not support a 
coalescence between the tones in the B123, B4, and DL1234 boxes, especially between the tone 
in the B123 box and the tone in the B4 box, their tone shapes are comparable, because all these 


8 As for other varieties, Yuen Ren Chao’s five-point pitch scale was applied to all the FO measurements which 
were obtained for the tones of that speaker, those measurements ranging from 197Hz (end point of the tone in 
the B123 box [41]) to 330 Hz (end point of the tone in the A123 box [225]). 
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three tones are falling or at least have a significant falling part (B123 [41], in B4 [42], and in 
DL1234 [442]). 


With such similarities of the tones in the B123, B4, and DL1234 boxes, the tone diagram for Hinngén 
appears to have some relation to the tone diagram proposed by Chamberlain for Nava. It should 
nevertheless be borne in mind that the Nava tone diagram shows a B-DL coalescence, with no splits in 
the B and DL columns, and does not have a B4-C4 coalescence. 

Accordingly, the various types of shading displayed in the tone diagram given for Hinngén Tai 
Meuay reflect the sets of tonal features given as falling tones with different tone numbers in B123, B4, 
and DL1234 (three shades of the same color), B4 = DL4, and DS1234. 


5 The speakers’ perspective: ethnographic and sociolinguistic information 
This research was also an opportunity to listen to speakers’ views on their ethnic group and language. 


5.1 Ethnographic information: ‘Tai Meuay Katip Nyeu’ and ‘Tai Meuay Katip Noi’ 
Tai Meuay speakers in Khamkeuth District, especially those in the area surrounding Lakxao City, 
identify two main groups of Tai Meuay: the ‘Tai Meuay Katip Nyeu’ — [taj** munaj“! ka? ti:p?!? pv:?!] 
—and the ‘Tai Meuay Katip Noi’ — [taj** munaj“! ka? ti:p?’? no:j“]. These designations refer to the size 
([nv:8'] ‘big, large’, [no:j“] small’) of a basket called [ka? ti:p?'*], which Tai Meuay women carry on 
their backs using a forehead band.” 

With respect to the speakers whose varieties were taken into account for the tone analysis proposed 
in the preceding section, all of them, but the one interviewed in Chomthong (Viengthong District),*° 
clearly stated that they belong to one of these two groups. 


e The male speaker interviewed in Théngkhe (Khamkeuth District) is a Tai Meuay Katip Nyeu. 
More generally, in the surroundings of Thongkhe, Tai Meuay are Tai Meuay Katip Nyeu. 


e The female speaker interviewed in Hinngén (Viengthong District) stated that her family and her 
husband’s family are Tai Meuay Katip Nyeu, and that there are Tai Meuay Katip Nyeu in the 
surrounding area. 


e The female speaker interviewed in Phayat (Khamkeuth District) is a Tai Meuay Katip Noi. The 
Tai Meuay in Lakxao City, as well as in the area of Théngkhe, say that Phayat Village is the only 
Tai Meuay Katip Noi location in Khamkeuth District. 


e The two female speakers interviewed for the variety spoken in Phénsy (Pakkading) are Tai Meuay 
Katip Noi. Other villages near Phénsy (Pakkading), such as Phénxay (Pakkading), Nakhuanai 
and Nakhuanok, also have Tai Meuay speakers. However, only a few speakers in that area say 
they are Tai Meuay Katip Noi. Others know and use only the designation ‘Tai Meuay’. 


This research did not investigate possible linguistic differences between Tai Meuay Katip Nyeu 
varieties on the one hand and Tai Meuay Katip Noi varieties on the other. The fact that the tone diagram 
for Hinng6n is very different from the one for Théngkhe, with speakers being Tai Meuay Katip Nyeu 
in both locations, suggests that it may be actually difficult to identify common features for at least the 
Tai Meuay Katip Nyeu varieties. We will simply say that the varieties spoken by the two groups are 
mutually intelligible. As for the differences mentioned by individual speakers, they are related to their 
auditory perception of a few segmental features (e.g., length of syllables) and prosodic features (e.g., 
sentence intonation) rather than to the use of specific lexical items. 


29 DI3) 


The Tai Daeng have the same basket and refer to it in the same way. A similar basket, called [ka? dy:p 
used by both the Tai Meuay and the Tai Daeng, but the weaving is slightly different, and the [ka? dy:p 
generally of a larger size than the [ka? ti:p?“”]. 

Talking about Chomthong and the Tai Meuay of that area, that speaker did not refer to the Tai Meuay Katip 
Nyeu and the Tai Meuay Katip Noi. When asked specifically about those two designations, she said that she 
had never heard them. 


, 1S 
DL3) is 
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5.2 Sociolinguistic information: ‘Tai Meuay Dong’ and ‘Tai Meuay Lao’ 

When the author of this article visited the village of Phonsy (Khamkeuth) in November 2017, Tai Meuay 
speakers there, who are Tai Meuay Katip Nyeu as in Théngkhe Village, explained that they identify 
two ways of speaking Tai Meuay. 


e The first way is called ‘Tai Meuay Déng’, the term [don**] referring to forest areas. It is the way 
Tai Meuay speak their dialect in their original locations, which are supposed to be in remote areas. 


e The second way is called ‘Tai Meuay Lao’. It is the way Tai Meuay Katip Noi and some Tai 
Meuay Katip Nyeu speak their language when they live with Lao people. 


The distinction between these two ways of speaking Tai Meuay was later confirmed by speakers 
belonging to other villages, such as Théngkhe and Vangko. 

According to Tai Meuay speakers in Phonsy (Khamkeuth), Thongkhe and Vangko, Tai Meuay Lao 
is spoken by all the Tai Meuay Katip Noi, such as those of Phayat Village in Khamkeuth District and 
those living in Pakkading District. It is also spoken by Tai Meuay Katip Nyeu in villages located on the 
main roads, such as Phonxay (Khamkeuth), a Tai Meuay village on the main road from Lakxao to Nakai 
where Tai Meuay Katip Nyeu are said to speak Tai Meuay Lao.*! 

While only some Tai Meuay Katip Nyeu actually speak Tai Meuay Dong, many of them live in 
the area comprising the villages of Phénsy (Khamkeuth), Théngkhe, and Vangko. Tai Meuay Katip 
Nyeu speaking Tai Meuay Déng in that area say that they originate from a village called Namuong.* It 
will be noted that, although villages such as Phénsy (Khamkeuth), Thoéngkhe, Vangko, and others are 
not on the main road between Lakxao and Nakai, they are rather recent locations.** 

With respect to the two ways of speaking Tai Meuay referred to as Tai Meuay Dong and Tai Meuay 
Lao, their difference is mainly related to word choice. One aspect of the difference between the Tai 
Meuay Déng and Tai Meuay Lao ways of speaking involves the use of pronouns. Tai Meuay varieties 
have the following first- and second-person pronouns: [?em“] and [ka:4*] ‘I’, [thaw“*] and [mum‘“] 


‘ ? 


you’. 


e A speaker uses [?em“] ‘I’ to mean that he or she does not enjoy a high status. That speaker will 
then use [mum4*] ‘you’ with a person of a higher status and [thaw‘*] with a person of equal or 
lower status. 


e A speaker uses [ka:4*] ‘I’ when he or she wants to stress his or her higher status. That speaker 
will then use [mumm4*] ‘you’ with a person of the same status and [thaw] with a person of lower 
status. 


While people speaking in the Tai Meuay Dong way use all those four pronouns, those who speak the 
Tai Meuay Lao way still use the four pronouns, but they tend to avoid [ka:““] ‘I’, [thaw‘*] and [mum‘“] 
‘you’, especially with outsiders. 

Another aspect of the difference between the Tai Meuay Dong and Tai Meuay Lao ways of 
speaking concerns the use of a few lexical items. Here are two examples. 


e For ‘rain’, we have [fyn*'] in Tai Meuay D6ng (as in Tai Daeng) and [fon*'] in Tai Meuay Lao 
(as in Lao). 


31 Tai Meuay Lao should not be considered as a recent development of the Tai Meuay language: a speaker of 


Phoénsy (Khamkeuth) in his fifties recalled he was born in another village, Nakhua, now an abandoned location, 
where the way of speaking was already identified as Tai Meuay Lao forty years ago. 

Namuong is located on a gravel path which starts in Wangko and ends in the vicinity of Nape. From Namuong, 
there is a pathway to Vietnam, and walking to Vietnam is said to take three hours. However, the only official 
checkpoint in the area is the one on the road from Lakxao to the border via Nape. 

The Tai Meuay Katip Nyeu speaking Tai Meuay Déng who came to live in Phénsy (Khamkeuth) live with 
some Tai Meuay Katip Nyeu speaking Tai Meuay Lao, themselves from other villages. As for the Tai Meuay 
Katip Nyeu speaking Tai Meuay Déng who came to live in villages such as Thoéngkhe and Vangko, they live 
with other Tai groups, the most prominent one being the Tai Bo (Chamberlain 1984:81—82; 1996:11—12). 


32 
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e For ‘calf (of the leg)’, we have [pi:*” khe:n®*] in Tai Meuay Dong, the form [pi:“7] being specific 
to Tai Meuay, and [bi:“? khe:1®] in Tai Meuay Lao (as in Lao). 


6 Conclusion: the position of Tai Meuay in the Southwestern Tai branch of the Tai 
language family 

In presenting the Tai Meuay language data from this research, this article has focused on tonal 
characteristics. Although the tonal features discussed in section 4 suggest a linguistic diversity of Tai 
Meuay in the area considered for this study, all the tone diagrams proposed for Tai Meuay varieties 
display both a 123-4 split in columns displaying tone splits and an absence of splits in their DS columns, 
giving a first glimpse of a Tai Meuay language uniformity. With regard to the phonological and lexical 
characteristics identified in the literature, they were found in all the locations which were investigated 
without significant variation, suggesting a more consistent Tai Meuay language uniformity. While the 
previous sections of this article left out phonological and lexical features, they will now be taken into 
account in this conclusion, which focuses on the position of Tai Meuay in the Southwestern Tai branch 
of the Tai language family. 

In Chamberlain’s 1984 account, as well as in Kullavanijaya and L-Thongkum’s 1998 article, Tai 
Meuay is posited as a dialect of the P group related to Tai Daeng. The data elicited for this research 
confirm that Tai Meuay varieties are to be related to Tai Daeng. The main arguments for that placement 
are the following: the absence of splits in the DS column; the reflex [k*-] of the initial *y-; the reflex of 
the rhyme reconstructed as *-aw. 


e Tone systems of all the Tai Meuay varieties dealt with in the literature (section 3) as well as in 
this research (section 4) exhibit a tonal coalescence in the DS column. According to Chamberlain 
(1984:67), “lack of splits in DS column seems to be areal feature of the Hua Phan dialects”. As 
Tai Daeng dialects in Thanh Héa Province (data elicited by the author of this article) are also 
characterized by a tonal coalescence in the DS column, it is reasonable to assume that it is a Tai 
Daeng feature. 


e When it comes to initial consonants, Chamberlain (1984:66) notes the reflexes [k-] and [te-] in 
Tai Meuay for words reconstructed with the initials *g- and *dz- respectively. Tai Meuay varieties 
share those reflexes with other languages belonging to the P group of the Southwestern Tai branch 
in the Tai family. However, in the P group, the reflex [k"-] in words reconstructed with the initial 
*y- characterizes Tai Daeng, such as [k"e:n®“] ‘lower leg, shin’, in contrast with [ke:n®*] in Tai 
Yo and Tai Dam. The Tai Daeng reflex of the initial *y- is found in all the Tai Meuay varieties 
studied in this research.** 


e When it comes to the rhyme reconstructed as *-aw, Chamberlain (1984:66) notes that the Tai 
Meuay reflex is [v:]. As Tai Daeng and Tai Yo have the same reflex [v:] for the rhyme *-aw, it 
helps separate Tai Daeng, Tai Yo, and dialects related to both languages, from languages such as 
Tai Dam, in which the rhyme is [-aur]. In connection with the rhyme *-aur, the cognate of the Tai 
Dam interrogative [daur*?] ‘which’ is [lv:“*] in both Tai Daeng and Tai Yo. All the Tai Meuay 
varieties mentioned in this article have the reflex [¥:] for the rhyme *-awi and use the interrogative 
[Iv:4*] of Tai Daeng and Tai Yo. 


Tai Meuay has been related to Tai Yo, as in the Pangloss Collection (https://pangloss.cnrs.fr/), and to 
Tai Dam, as in Glottolog 4.3 (https://glottolog.org/). However, neither the Pangloss Collection nor 
Glottolog 4.3 have given references with supporting arguments. The relationship of Tai Meuay with 


34 Considering the initial consonant [k*-] in Tai Daeng and in Tai Meuay varieties leads to another important 
difference of Tai Daeng and Tai Meuay with Tai Yo. While Tai Yo lexical items such as [ha:w“'] ‘white’, 
[haw°'] ‘enter’, or [he:n“"] ‘arm’ have the initial consonant [h-], their cognates in Tai Daeng and in Tai Meuay 
varieties are [k"a:w“!], [kaw], and [k"e:n“']. As a matter of fact, words whose initials were reconstructed 
by Li Fang-kuei (1977) as *x- or *kh-, and later by Pittayawat Pittayaporn (2009) as *x-, *y-, or *q-, are never 
found to display in Tai Daeng and Tai Meuay varieties the initial [h-], a reflex specific to Tai Yo and other 
Tai dialects of Nghé An and Bolikhamxay related to Tai Yo such as Méne. 
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Tai Dam, which appears to be only a relationship between languages belonging to the P group, might 
be based on Vietnamese scholars’ accounts linking Tai Meuay with a location in the Tai Dam area: 
“Muong Muoi, which today belongs to the district of Thuan Chau, province Son La” (Vi 1996:32). 

Even if Tai Meuay varieties are definitely to be related to Tai Daeng, they appear to have a kind 
of relationship with Tai Yo and other Tai dialects of Nghé An and Bolikhamxay which can be related 
to Tai Yo. That relationship, which can be described in terms of language contacts, mainly involves 
lexical items. 


e Verbs: [ki:4*] ‘eat’, [pa:““] ‘go’, [pe:4*] ‘be’, ['2a:4*] ‘take’, [Pe:4*] ‘do’. 


e Pronouns: [ka:4*] ‘I’, [sa:“*] ‘he, she, it’. It will be noted that, although Tai Meuay shares [ka:“*] 
as a pronominal word with Tai Yo, [ka:4*] has a different use in Tai Yo and expresses reciprocity. 


e Other items shared by Tai Meuay varieties and Tai Yo: [?uk?® ?ik?%?] ‘brain’, in contrast 
with [?e:k>'] in Tai Daeng; a tense marker expressing the future tense ([kham‘']); the pluralizer 
[muraj©'], which in that case tends to be pronounced [my:j°']; the phrase referring to the moon 
[ma:k™"! buran‘?], as against [to:4? buan4?] in Tai Daeng. 


In contrast with the relationship of Tai Meuay with Tai Yo and other Tai dialects of Nghé An and 
Bolikhamxay related to Tai Yo, the relationship of Tai Meuay with Tai Daeng appears to be a deep 
linguistic and ethnolinguistic relationship. Chamberlain (1984:68) makes the following remark. 


“The lineage names are identical to those of the Red Tai. There is a red band around the top of the 
woman’s sarong which is worn tied above the breasts, also like the Red Tai.” 


This article has noted that the Tai Meuay share with the Tai Daeng the basket called [ka? ti:p?!*]. A 
last example of that deep relationship is the fact that both Tai Daeng and Tai Meuay share a lexical 
particularity: they use different words for the lexical item ‘mother’ ([?e:“*] in Tai Daeng, [me:“*] in all 
Tai Meuay varieties) on the one hand and to refer to female animals ([me:®*]) on the other. In Lao and 
most Tai languages spoken in Laos, the form [me:**] is used for both meanings. 
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Abstract: 

This paper continues ideas introduced by Sidwell (2020) concerning Austroasiatic 
prehistory and the Neolithic transformation in Mainland Southeast Asia to propose a 
fundamentally new vision of the AA dispersal. I speculatively propose that a significant 
proportion of early AA speakers were oriented to estuarine environments—as opposed to 
inland or upland ones as has been commonly assumed. I abandon my earlier (Sidwell 2009, 
2010 and elsewhere) proposed AA homeland on the middle Mekong, in favor of a Red 
River Delta locus of dispersal, from which settlers emigrated variously upstream (Northern 
AA speakers) and coastally around Indo-China to the Malay peninsula and India. This 
coastal movement may have been facilitated by improved watercraft, perhaps mediated by 
interaction with early Austronesians. The hypothesis is consistent with the aquatic 
component in the early AA lexicon (Blench 2018), and the emerging archaeological 
indications of the coastal spread of Neolithic rice farmers. 


Keywords: Austroasiatic, homeland, migration, reconstruction 
ISO 639-3 codes: vie, kuf, bru, bdq, sti, Ibo, kjg, mlf, puo, pll, vom, mnw, cbn, khm, 
cog, khr, pcj, srb 


1 Introduction: the AA homeland reviewed 

Over the past century or so, many ideas about the Austroasiatic (AA) homeland and migrations have 
been expounded, with sharply conflicting assumptions, and reliance on divergent kinds of evidence. 
Below, we briefly survey the range of proposals, before moving on to the new hypothesis. The extant 
range of AA homeland proposals are grouped below into seven broad categories, based on the 
geographical assumptions and the kinds of evidence regarded as significant. These are listed as follows: 


1) Central China: It may be assumed that as Neolithic cereal farmers, AA correlates with ancient rice 
domestication about the middle Yangtze, potentially in relation to other Asian language families. The 
idea is particularly associated with the observation by Norman & Mei (1976) that the name of the 
Yangtze River in ancient Chinese resembles a generic AA word for ‘river’ (although the thrust of the 
N&M paper favors a Southern China hypothesis). Scholars such as Starosta (2005), Sagart (2003, 2011) 
and others, have posited a general origin of Asian languages in central China, citing broad typological 
correlations between multiple families: 


The shared linguistic typology just described can be due to a very old genetic relationship between 
AA and STAN ' [...] or to diffusion. In either case, a period of geographical closeness between 
languages ancestral to AA and STAN must be assumed. 

(Sagart 2011:355) 


While the broad explanatory power of the central China hypothesis also partly explains its 
attractiveness, no convincing body of lexical coincidences has emerged to support the hypothesis. 


! STAN = Sino-Tibetan-Austronesian. 
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2) SW China: This model puts AA inland and upland, at or near the convergence of the great rivers 
emerging out of Tibet: the Salween, Mekong, Yangtze/Jinsha. The basic assumption is that communities 
migrated downstream to their present locations; ostensibly reasonable as it potentially explains the 
movement of AA speakers into both Indo-China and Eastern India. It is articulated by Blust (1996) and 
Peiros (2011), in particular. 


3) Southern China: The region of Yunnan, Guangdong, Guangxi, and Fujian were inhabited by Yue or 
barbarians, later variously absorbed or driven south by the Chinese since Han times. The hypothesis 
was particularly supported by Norman & Mei (1976) comparing a handful of Yue and Chinese words 
with AA etyma; although a widely cited paper, the claims were essentially refuted by Sagart (2008). 
Schuessler (2006) controversially proposes hundreds of Chinese-Khmer lexical parallels (ignoring 
advances in AA reconstruction), suggesting that AA speakers were present in southern China and 
maybe as far north as Shandong. The southern China hypothesis is also supported by archaeologists and 
geneticists (e.g., Bellwood 2021, Higham 2021, Lipson et.al. 2018, McColl et. al. 2018) as evidence is 
strongly emerging that cereal agriculture came into Indo-China from southern China during the 
Neolithic. 


4) Indo-China (central or northern): The diversity of AA branches in Indo-China, with half the family 
more or less along the Mekong, suggests a locus of origin and riverine dispersal vectors (e.g., Sidwell 
2009, Sidwell & Blench 2011). Forms of the hypothesis go back to Schmidt’s (1906) Austric proposal, 
which saw the overlap of AA and Chamic as indicating dispersal of Austric from Indo-China. 
Historically, northern Indo-China overlaps with Yue China, and the Red River Delta is ambiguous in 
this respect, so a hard line is not drawn between these here. Any hypotheses reliant on interpretation of 
language geography and diversity are dependent on phylogeny, and this remains controversial in regard 
to AA. 


5) Bay of Bengal coast~hinterland: The center of diversity argument shifts eastward when one places 
special emphasis on Munda as a primary branch, and this is proposed by van Driem (2001, 2007) and 
Diffloth (2005, 2009). These scholars also note that the environment around the Bay of Bengal includes 
animals and plants indicated in the proto-AA lexicon, as well as giving medial proximity to both India 
and Indo-China. Presumably the locus ranges from the Ganges-Brahmaputra delta to the Irrawaddy 
Delta, and the Patkai Range in between. Although appealing on grounds of simplicity, this hypothesis 
lacks archaeological or other interdisciplinary support. 


6) Eastern India: Among South Asia-oriented scholars there has been a strong tendency to identify 
India as the homeland of AA, and the Munda as an especially ancient population there. For example, 
Peterson (2017) makes a typological case that AA constitutes a prehistoric substrate in the eastern half 
of northern South Asia, investigating substrate hypotheses that go back to Kuiper (1948). Van Driem 
(2001) characterizes Munda as the most linguistically conservative AA branch, echoing Pinnow’s 
(1963, 1966) discussions of Munda morphology and confidence that proto-AA was extensively 
prefixing and suffixing. Typological arguments have been taken further, with Donegan & Stampe 
(1983, 2002, 2004) arguing that Munda independently shifted in grammatical and rhythmic structure, 
and: 


This suggests that the Austroasiatic people may have dispersed from South Asia rather than South- 
East Asia, and the shift of Munda from rising to falling rhythm, after the eastern language had moved 
eastward, may have been the cause rather than the effect of the profound polarization of South and 
South-East Asian language structures. 

(Donegan & Stampe 2004:27) 


However, the typological arguments have been strongly challenged recently (see, for example Anderson 
2020, Jora & Anderson, this volume) and great antiquity of Munda in India is open to challenge. 
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7) Punjab: Witzel (1999 etc.) has argued that Vedic shows signs of a local substrate with AA features 
(especially particular prefixes), which he calls “Para-Munda” and regards as an ancient AA language, 
implying AA speakers in the Punjab. He also speculates that an AA may have influenced language in 
the Himalayas and eastern Afghanistan and had links to Sumarian. He speculates, “If indeed so, the 
speakers of (Para-)Austro-Asiatic would have been builders of a number of great civilizations, from 
Mesopotamia to Pakistan/India, Burma and Cambodia.” (Witzel 1999:12) While this is not an explicit 
claim of homeland location, Witzel places AA in western India prior to 1500 BC, with a clear 
implication that the language subsequently spread eastward. 


It is clear that there is no obviously correct solution the AA homeland problem; well informed scholars 
have come to widely divergent and entrenched views. We would hope that if real progress were being 
made in recent years/decades, that views and evidence would be converging on some kind of consensus, 
with the clarity of the Austronesian out of Formosa hypothesis (see Blust 2019 for a recent overview). 
Another aspect of the problem is that secondary literature--such as reference works and interdisciplinary 
papers--have mostly failed to reflect the diversity of views of the subject of the AA dispersal. Specific 
claims and hypotheses are cited without critical engagement, gaining authority by dint of repetition and 
re-citation. Regrettably, these days AA studies is a small and geographically dispersed field, and 
without a critical mass of interested scholars and students it is difficult to sustain constructive 
discussions that can move things forward. 

The homeland hypotheses reviewed here rely on a diversity of evidence and inferences, and while 
these can offer attractive narratives, mostly they are articulated without assessing the strengths of 
counter claims. We see this quite starkly, for example, in the fact that the India and China homeland 
claims have been supported by both lexical and typological comparisons. Yet those comparisons are 
not tabled against those of the competing hypotheses nor their strengths and weaknesses discussed. 
Furthermore, these are dependencies between the kinds of evidence that are used: for example, what 
are the probabilities associated with lexical similarities given particular typological alignments? What 
are the probabilities particular typological parallels will occur independently among neighbors, or 
would emerge from selective lexical comparisons? These are interesting questions and also go to 
general questions of how we assess the strengths of claims made about reconstructed and ancient 
languages (i.e., paleolinguistics). 

Given this state of AA homeland views, we find maps like those presented here as figures 1 and 2, 
with arrows showing suggested migration paths in directly opposite directions. Such maps published 
before 2019, as far as I can tell, all have something in common: they imply primary movement overland 
and along rivers, with only the Nicobarese crossing substantial waters. The principle is so neat that the 
homeland problem seems to resolve to a discussion about whether peoples moved upstream or 
downstream here or there, such that we might draw our arrows going clockwise or anti-clockwise 
depending on our favored origin point. This may make for straightforward narrative, but it does little to 
actually test competing claims on their strengths, and it boxes any discussions into an inland riverine 
perspective at the expense of other possibilities. 
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Figure 1: “The Southeastern Riverine hypothesis for the Austroasiatic dispersal” 
(Sidwell & Blench 2011:339, figure 6) 
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Figure 2: “...schematic representation of the routes of migration of the different Austro-Asiatic 


linguistic subgroups of India.” (Kumar et. al. 2007-49, figure 1). 
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Another commonality among considerations of AA homeland is that ancient AA speakers were cereal 
cultivators before they dispersed into their recognized branches, based on the distribution of cereal 
related lexicon (see Sidwell & Rau 2014 for an overview). If this is accepted, it has strong implications 
for the possible AA center of dispersal. A very old chronology, say more than 7ky BP, effectively forces 
a homeland identification somewhere in China, proximal to the Yangtze, since we can confidently 
identify millet and rice in central China at such a time depth (in fact domestication of millet occurred 
further north than rice and may have led the spread of cereal cultivation chronologically (Stevens et al. 
2021)). Such early dates also seem to rule out an Indian homeland as archaeology now indicates that 
rice and millet cultivation was practiced in the Indus Valley around 4400-4200 BP, and came somewhat 
later to the Ganges (Bates et al. 2017, Petrie et al. 2016). 
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A very old chronology for AA has found independent statistical support since the 1990s, tending 
to put proto-AA in a 7~9ky BP window. This includes the glottochronology of Peiros (1998, 2004) and 
the calibrated chronology by Greenhill using the 200-word data set that this author compiled (Sidwell 
2015). To these we can also add Diffloth’s (2005) 7,000-plus year chronology based on intuitive 
reckoning. While these dating estimates would seem to support a central China homeland, we have the 
problem that AA shows a rice vocabulary which is independent from other Asian language families 
(see Sagart 2011), and scant indication of any lexical parallels between AA and Sino-Tibetan or other 
families beyond some odd lexical similarities (e.g., Norman & Mei 1976).? 

I take the view that the results of the various statistical studies have to be treated with great caution. 
Rates of lexical change may have been highly variable in the past, and the very low cognate scores that 
underlie the older dates may reflect uncontrolled factors such as word tabooing, and founder effects 
arising from small groups migrating to remote locations and their interactions with autochronous 
populations. Consequently, analyses that include, for example, Munda or Nicobarese, may be skewed 
in ways we do not understand, and removing such outlying branches from consideration significantly 
reduces the age estimate. 

Thomas (1973) applied the glottochronological replacement rates specified by Gleason (1955)> to 
the lexicostatistical set of Thomas & Headley (1970), which did not include Nicobarese, Munda, or 
Pakanic, and found a root date squarely within the Indo-Chinese late-Neolithic. Thomas found that the 
calculations, “...could point to a mass dispersal from some central Mon-Khmer homeland some time 
during the 2nd millennium B.C.” (1973:139). Specifically, a date of 3800 BP is indicated for 20% of 
cognacy, which is in the lower range of the inter-branch percentages, and thus can be regarded as 
indicative. 


2 AA homeland and the Red River Delta 

If we do locate the AA dispersal in time around, say, 4000 BP, a dispersal center in southern China or 
Indo-China becomes a serious possibility. Studies in recent decades across multiple disciplines support 
the view that agriculturalists from southern China began to migrate into the Red River Delta (RRD) and 
other parts of northern Indo-China from the last centuries of the fifth millennium BP, bearing the 
“Neolithic package” that included rice, millet, pigs, dogs, incised pottery, and supine burials, attested 
at sites such as Man Bac. This yielded a genetically mixed population, hence the term “Two-layer 
hypothesis”: 


The traditional archaeological evidence, including ceramic vessels and stone adzes, is thus unanimous 
in identifying the settlement of mainland Southeast Asia by at least 2000 BC by migrants from the 
north.... 

(Higham 2021:26) 


... cranial and ancient DNA analyses of the excavated Neolithic skeletons (c. 1900 BCE) from this site 
reveal a remarkable admixture between an indigenous and morphologically Australo-melanesian 
population that was represented earlier at Con Co Ngu’a, and an immigrant Neolithic East Asian 
population that was morphologically related to modern Vietnamese. The latter entered Vietnam from 
the north, according to the ancient mitochondrial DNA record from Man Bac and cranial comparisons 
with Neolithic populations in China. 


Indeed, it is clear that [...] the centuries around 2500-2000 BCE witnessed some remarkable 
cultural and biological changes in Southeast Asia (Hanihara et al. 2012; Higham 2013). South Chinese 
Neolithic populations with food production based on rice, millet, pigs and dogs pressed southwards, in 
the process settling alongside or simply amalgamating the indigenous hunter-gatherer populations 
(Bellwood 2015:55). 


The resulting Phung Nguyén culture (4000-3500 BP) of northern Vietnam is thus a prime candidate for 
being either proto-AA or late-AA speaking, and the direct ancestor of the Vietic languages of the region 


2 This is not withstanding the fact that various claims for lexical parallels have been made. For example, Peiros 


(1998) presents some 15 AA-HmongMien comparisons in support of his wider Austric model. 


3 Thomas specifies 1965, apparently in error. 
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spoken there today (cf. Alves 2021 for further discussion). Among archaeologists who advance the two- 
layer hypothesis, such as Charles Higham and Peter Bellwood, it is supposed that the incoming farmers 
were AA speaking, with the local indigenes shifting to AA speech. This assumption leaves open the 
possibility that AA originated somewhere further north, and that is not disputed here; in this paper we 
are concerned with where, when, and how the AA culture and language dispersed across the region. 
There may have been a pre-AA language that migrated from one location to another before ultimate 
dispersal. Alternatively, AA may have been indigenous to the RRD or some nearby territory, with 
inward settlers assimilating to the local language. I personally doubt that we will ever have a clear 
answer to this problem, and prefer to focus on the dispersal question. 

The RRD as a locus of AA dispersal also implies that it is the Vietic homeland. While the Vietic 
homeland is not a matter of consensus, various scholars do explicitly identify the pre-Common Era 
Dong Son culture of the Red River Valley with Vietic, such as O’Harrow (1979), Ferlus (2009), Tran 
(2011), Taylor (2013), and Alves (2021) and continuity from Phtng Nguyén to Dong Son, both cultural 
and demographic, appears to be supported by a diversity of evidence. And as with Mon and Khmer, 
there is no evidence that indicates their migration from some inland origin; rather, Vietic speakers seem 
to have been living in the Red River delta, exploiting the rich estuarine environment for farming and 
gathering, for as long as we can tell. 

The RRD, as an estuarine environment, is also consistent with the reflections offered by Blench’s 
(2018) paper: “Waterworld: lexical evidence for aquatic subsistence strategies in Austroasiatic”. That 
paper emerged after Blench and I cooperated on a (2011) paper combining linguistics and archaeology 
to propose a 4000 BP AA dispersal from somewhere along the Mekong, accepting traditional 
inland/riverine assumptions. 

Blench (2018) explores in detail specific lexical indications for the AA homeland, focusing on 
likely utilization of aquatic environments by early AA speakers: 


Although early Austroasiatic speakers were clearly crop producers, growing both taro and rice, if they 
were largely following river basins, aquatic technology and subsistence must have been highly salient 
in their vocabulary. [...] a number of lexical items can be shown to be common to many of the branches 
of Austroasiatic, suggesting them as reasonable candidates for the proto-language. [...]. 

(Blench 2018:192) 


Blench assembles evidence for proto-language items for: 


Fauna: fish (general), catfish, crab, crocodile, eel, heron, otter, pelican, prawn/shrimp, tortoise, 
turtle, turtle (freshwater) 

Fishing: to poison fish, scoop net, fish trap 

Boats: three roots for boat 

Geographical features: large river/sea, river valley, ditch/canal 


Such lexical indications are arguably consistent with estuarine/coastal environments; potentially far 
richer in terms of diverse food sourcing than inland waterways, and demonstrably consistent with Man 
Bac and other Phting Nguyén sites that have been identified and documented in Northern Vietnam. 
Doubtless, there are many other locations that fit these broad indications if one includes all the coastline 
from the Pearl River delta to the Mahanadi-Brahmani Delta of India, but the RRD has a special 
resonance in this context of the two-layer hypothesis, and so is a strong candidate the AA locus of 
dispersal. 

A specific linguistic argument can also be made for the primacy of the RRD. Ferlus (2009) argues 
that Vietic preserves evidence of the morphological formation of the word for ‘pestle’ (Vietnamese 
chay) from ‘to husk’ (Vietnamese xay) ultimately from ‘to dig, hollow, excavate’ by <r> infixation 
(*tfe? > *tfe: > *tfre: > ze:) speculating that it reflects the specific innovation in Southeast Asia of the 
wooden husking mortar and pestle, which proved easier to make and husked grains with less shattering 
compared to stone mortars. While Ferlus’ point is to suggest diffusion from ancient or pre-Vietic across 
AA, locating the AA dispersal in the Indo-Chinese Neolithic permits us to correlate the dispersal of 
‘pestle’ with the dispersal of AA itself, and even hint at explanation for that dispersal, i.e., a special 
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case of the farming/language dispersal hypothesis (Renfrew 1987 and passim). The wooden pestle may 
have been a key innovation for early AA speakers adopting cereal cultivation, and carried widely as 
AA dispersed, even to be found in Munda. This strikes me as the best explanation for the distribution 
of the ‘pestle’ etymon across the family, as exemplified below: 


Vietic: Vietinamese chay 

Katuic, Kantu ntre:, Bru ntri: 

Bahnaric: Bahnar hdraj, Stieng renaj, Laven ?rej 
Khmuic: Khmu en‘re?, Thin ngre?, Ksingmul hagé: 
Palaungic: Palaung gre:, Wa ni? 

Monic: Mon ri’, Nyah Kur yri:? 

Khmeric: Khmer ?onre: 

Pearic: Chong kahi:®" 

Munda: Kharia endi, Gorum in(d)ri, Sora onrij 


The root reconstructed *tfe? ‘to dig, hollow, excavate’ by Ferlus is phonologically marked, as the *tf 
segment is rare in the reconstructed proto-Vietic sound system, and generally not reconstructed for 
proto-AA. Ferlus speculates that *t{was part of a layer of “Dongsonian vocabulary” that Vietic speakers 
took on as they migrated into the RRD region, correlating with the demographic “two-layer hypothesis”, 
I take no position here on the origin of *tfin AA. The dispersal of its reflexes is arguably aligned with 
the primary dispersal of AA. 

Arriving at this point in our discussion, the question arises of by what routes and means did early 
AA groups disperse? What are the pros and cons of the RRD as the principal AA locus of dispersal? Is 
there something about the RRD that suggests a new way of looking at the problem? And is there 
anything else we need to consider? 


3 Maritime Migration? 

Among AA branches, only the Nicobarese were traditionally considered in terms of maritime migration, 
and then as an anomaly in a language family of mainly inland dwellers, reflecting the strong bias 
towards conceptualizing AA migrations as inland/riverine events. That changed recently with the 
“Munda Maritime Hypothesis” (MMH) of Rau & Sidwell (2019), which proposes that pre-Munda 
spearkers migrated from SEAsia to Odisha (India) across or around the Bay of Bengal. Just as the 
Nicobarese necessarily reached their islands by sailing over open waters in Neolithic times, the MMH 
suggests that the pre-Mundas did something similar. Secondarily, the MMH also asks if the pre-Aslian 
speakers also arrived on the Malay Peninsula by coastal navigation. If correct, these speculations imply 
the possibility of an ancient AA coastal/maritime culture on the shores of the Andaman Sea. 

While the MMH marshals multiple lines of argumentation—linguistic, archaeological, 
geographical, and genetic—the main thrust is that the distribution of Munda languages is best explained 
by dispersals upstream and outward from the Mahanadi-Brahmani Delta (MBD). Assuming that the 
MBD was not itself the origin point of AA,* a locus at the MBD is quite significant; the only other AA 
branch present in mainland India, Khasian, shows clear affiliation with the Palaungic language of 
Myanmar (Sidwell 2011) such that we can confidently suppose a pre-Khasian migration through Upper 
Burma to the Bhramaputra Valley on the way to Meghalaya. Munda seems to have appeared on the east 
coast of India—tentatively correlating with the Eastern Wetland Tradition? dated by archaeology to 
around 3500 BP—without signs of connections to any AA groups in India or immediately east of the 
subcontinent. In fact, the most striking lexical connections are between Munda and AA language of 
Indo-China, especially among the pronouns and numerals of Vietic and Bahnaric, suggesting a direct 
connection to the Vietnam coast. 


4 This is itself a question worth considering, and I encourage it be tested thoroughly. 


5 A Neolithic farming culture that predated the arrival of Indo-Aryans in Odisha. 
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Figure 3: Rau & Sidwell (2019:40, figure 2) proposed original dispersal region of Munda 
Languages. 


dispersal region of 
Munda languages 
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The case for at least a partial maritime component to the pre-Munda migration is strongly indicated 
by geography. The Ganges-Brahmaputra delta would have presented a significant barrier to migration 
in Neolithic times (see figure 4) and navigation past the mangroves and shallow river mouths has to be 
seriously considered. Furthermore, in such a journey across the northern Bay of Bengal, one finds the 
MBD is the first hospitable landing place once the mouth of the Ganges-Brahmaputra is passed, with 
the adjacent Chilka Lake functioning as a sheltered harbor and desirable estuarine fishery for millennia. 
It is as if the pre-Mundas appeared out of nowhere on the Odisha coast in Neolithic times in just the 
right place for landfall coming from the east. And there can be no doubt that pre-Mundas around 3500 
BP had the capacity to skirt the Bengal coastline; the Nicobarese settlement involved crossing a greater 
stretch of open water than required to skirt the Bengal coastline, so there is simply no basis for ruling 
out coastal navigation as a mode of migration in the Neolithic. 

Given these speculations regarding Neolithic AA navigation west of the Kra isthmus, I began to 
ask if coastal navigation had been even more widespread, and could it provide a wider basis to explain 
the distribution of AA branches? Could we examine if coastal/maritime migration provides as good or 
better explanation for the location of more AA branches than discussed in the MMH? There are hints 
of such speculations in an observation by Shorto four decades ago: 


The Northern Mon-Khmers and Khasis are likely to have followed what became a Chinese trade 
route to India, as the Mundas may well have done before them. But there seems no overriding reason 
to trace routes for the Mons and Khmers, and other groups who occupied the river-plains, down the 
rivers from the hinterland rather than up them from the coast. 

(Shorto 1979:278) 


Note the phrase “up them [the rivers] from the coast”. It has been a staple of paleo-linguistics of East 
and Southeast Asia that language families dispersed substantially by moving down rivers, particularly 
down those with headwaters in the eastern Himalayas. But what happens if we reverse the logic and 
imagine that the early AA groups were principally coastal and estuarine dwellers whose inland 
migrations began upriver from the coast? Arguably several AA branches of Mainland Southeast Asia 
have their historical loci by the coast in estuarine settings. 

Archaeology has established that the earliest Mon-Dvaravati urban settlements closely 
approximate the highest water line of the Chao Phraya River during the mid-Holocene sea-level maxima 
(Mudar 1999), suggesting that they developed from settlements founded right on the Bay of Bangkok 
coastline in Neolithic times. 

Belonging to a later period, the earliest archaeological indications of Khmer urbanism (the Funan 
period) are found at locations such as Oc Eo in the Mekong Delta; from the first century CE this was a 
trading port on the “Maritime Silk Road” (Stark 2006, Higham 2014). Stark notes: 


At least ninety "Oc Eo" period complexes have been recorded throughout southern Vietnam's Mekong 
delta (Vo Si Khai 2003); contemporary sites have been reported along the coasts of peninsular Siam 
with similar material culture. 

(Stark 2006:149) 


The earliest identifiable Khmer settlements are coastal, around the gulf of Thailand, while settlements 
further inland or up-river are historically later, as the development of land and water management 
methods allowed for expansion into areas with less reliance on seasonal flooding and other forms of 
natural irrigation. Furthermore, similar sites are found on the Siam peninsular coast, consistent with the 
narratives of Coedes (1968), Wheatley (1961) and others who saw 1* century Funan as extending from 
the Mekong Delta, around the Gulf of Thailand, and dominating the isthmus including its west coast. 
While this period is nearly two millennia later than the putative Neolithic AA dispersal, it starkly makes 
the point that our contemporary sense of Khmers as oriented to the inland is an artifact of later history- 
-the shift of the Khmer political center to Angkor after 800 CE, and subsequent reorientation towards 
an inland empire. 

Over many centuries, various Austronesians came to dominate much of the Mainland Southeast 
Asian coastlines; Chams along the central and southern Vietnamese coasts, Malays around the 
peninsular, Mokens and other Sea Nomads, among others. Studies such as Thurgood (1999) have 
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clearly established that there is an old AA substrate in the mainland Austronesian languages, and these 
were obviously acquired in contexts of coastal and/or insular interactions. The case seems compelling 
that a significant proportion of ancient AAs were oriented in their culture and economy towards the 
seacoasts, especially deltas and estuaries, leaving an imprint in Cham, Malay, and other Austronesian 
languages. Frankly, to my mind it stretches credibility that ancient AAs, as both farmers and foragers, 
would not take advantage to the opportunities the mainland estuarine environments present. As Higham 
points out: 


The coasts of Southeast Asia, and particularly river estuaries, offer one of the world’s richest habitats 
in terms of natural bounty. On a world scene, hunter-gathers living in such marine habitats can secure 
their territory through permanent occupation and become very rich in social, technological and 
economic terms. 

(Higham 2004:42) 


Deltas and river flats are prime natural locations for rice farming, and deltas such the RRD have long 
been prime rice production regions. While upland dry rice cultivation is old in Mainland Southeast Asia 
and clearly important for AA (cf. Fuller & Castillo 2021 for overview), and upland cultivators are 
somewhat mobile, they typically do not move far when establishing new swiddens and often cycle back 
over the same ground after an appropriate period of fallowing. Also, upland cultivation is rather low in 
yield and does not support high populations or high population growth relative to lowland rice farming, 
which in favorable conditions can support multiple cropping within a given year. In this context, it 
makes sense to associate lowland rice cultivation with population growth and the hypothesized search 
for new estuarine environments with the kind of long-distance movements needed to account for a 
relatively rapid Neolithic dispersal of AA groups. Thus, we should not be surprised when Castillo 
(2011), citing Thompson (1996), writes: 


The first evidence of domesticated rice in Thailand using macroremains dates to 2000-1500 BCE from 
the Neolithic period in the coastal site of KPD [Khok Phanom Di] 
(Castillo 2011:115) 


Figure 5: Fragment of Castillo (2011:115, figure 1) map showing sites with evidence of rice (Thailand). 
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Khok Phanom Di is (9) on Castillo’s map (figure 5), shown right on the edge of the Bangkok plain 
today but four millennia ago located by the shore of the Bay of Bangkok. 

There are potentially earlier sites in central and northern Thailand based on phytolith remains but 
apparently their evaluation is remains problematic. The evidence for early millet at these inland sites is 
much better. As Castillo remarks: 


Higham (2002) originally proposed that rice agricultural expansion followed major riverine routes 
and would be archaeologically visible in interior sites, an idea previously put forth for Austroasiatic 
language expansion by Blust (1996). However, Ban Tha Kae and Ban Chiang are the earliest interior 
sites dating to the Neolithic and are reported to have rice cultivation, but the evidence is based on 
rice-tempered pottery, so it may be open to doubt. 

(Castillo 2011:115) 


We lack clear archaeological evidence for the expansion of rice cultivators along riverine routes as 
predicted by the traditional models of AA dispersal. Instead, there are some ambiguous indications of 
rice-husk impressed pottery at some early inland sites (i.e., these may be traded items). On the other 
hand, unambiguously, we have rice being grown by the Bangkok plain in Neolithic times. 
Consequently, I propose that the predecessors of the Mons and the Khmers were AAs sailing the coast 
looking for rich estuarine environments to settle (and perhaps bringing rice with them?). As Shorto 
mused, “...there seems no overriding reason to trace routes for the Mons and Khmers, ...., down the 
rivers from the hinterland...”. Additionally, there are at least two other AA branches that we can 
examine from the same perspective. 

Ferlus (2011) reconstructs the Pearic homeland in the vicinity of Thailand’s Trat Province and 
Cambodia’s Koh Kong Province. While the coasts of these provinces are nowadays dominated by ethnic 
Thais and Khmers, Pearic peoples live only tens of kilometres inland today. At the intersection of these 
provinces is the mouth of the Meteuk River and the modern Port of Koh Kong, which suggests a likely 
landing place for pre-Pearic settlers, providing both sheltered harbor and ready access to the inland. 

Another potentially coastal AA branch is the AA substratum in Chamic, a putative lost AA branch, 
also discussed by Blench (2009). Thurgood’s (1999) historical reconstruction of proto-Chamic 
demonstrated nearly 600 words of non-Austronesian origins in the Chamic historical lexicon, with 
roughly half of these identified as being AA in origin. While Thurgood assumed that this borrowing 
was largely from Bahnaric and Katuic into Chamic in ancient times, the direction of borrowing must 
have been largely from Chamic (Sidwell 2007, 2008). Furthermore, much of the proto-Chamic lexicon 
of AA origin cannot be identified with any specific AA branch, strongly suggesting that an unknown 
AA branch was spoken on the Vietnam central coast and absorbed into Chamic. 

While admittedly circumstantial, we have now identified some seven AA branches whose locations 
can be accounted for by estuarine settlement and potential maritime arrival/departure in competition to 
traditional assumptions of inland origins and movement. However, there do remain AA branches that 
clearly fall outside the possibility of any maritime hypothesis, plus some ambiguous cases. 


4 Inland Austroasiatic 
The Khasian, Palaungic, Mang, Pakanic (and perhaps Khmuic) branches can be loosely grouped into a 
northern AA clade of the basis of sharing a form for the 1“ person personal pronoun that reconstructs 
as *29:? (see Sidwell 2014 for overview of classification). Forms within Khmuic are split between 
reflexes of *?9:? (Khmu, Mlabri) and *?ap (all other Khmuic), the latter being widely (although not 
universally) reflected in the rest of AA (including Munda, e.g., Juang, Ho, Mundari /ap/ ‘I’). Frankly, 
the status of Khmuic as a unitary branch is somewhat ambiguous, but for the present purposes it clearly 
falls into the category of geographically northern branches. It seems intuitively obvious that the 
distributions of all the Northern branches are readily explained by migrations upstream along the Red 
River and tributaries, with groups branching off as they went, eventually hitting the higher reaches of 
the Mekong and Salween, and ultimately descending to the Brahmaputra (see map at figure 6). 

In addition to the Northern AA groups, Bahnaric and Katuic are ambiguous as their centers of 
diversity are in the hills of the Annamite range, with no apparent history of coastal presence. It has 
always been assumed more or less that these groups ascended the Annamite Range from the Mekong 
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side, although it is not at all clear that this is the case. We do see hints of directionality in the expansion 
of Kui people into Thailand and Cambodia, and of South Bahnaric speakers into Cambodia; these look 
to be relatively recent (Common Era) expansions given their internal homogeneity, perhaps related to 
the Khmer Empire providing motivations for lowland settlement. At a deeper historical level, we simply 
do not know who was living on the Indo-Chinese coast before the arrival of the ancient Chams, and it 
may well be that the ancestors of the Katuic and Bahnaric speakers were oriented to the coast in ancient 
times. In any case, there is nothing in particular about the distribution of Katuic or Bahnaric that 
contradicts an initial coastal dispersal, versus the inland alternative. 


5 Conclusion 

Our discussions have brought us to an interesting conclusion: one can account for the distribution of all 
AA branches on the bases of just two primary vectors: (1) inland up the Red River and beyond and (2) 
a coastal dispersal perhaps in one or more pulses, clockwise around the Indo-Chinese coast and around 
or across the isthmus into the Andaman Sea and beyond. We can speculate that AA migrants initially 
took with them rice and millet, and the wooden mortar and pestle for husking grains (with the 
Nicobarese, for example, eschewing these for a primarily fishery and tuber-based subsistence). 
Secondary expansions/migrations after the speculated coastal movements were primarily inland and up- 
river. We can propose that demographically, the first migrations were rather small and were focused on 
finding estuarine sites for settlement, rather than seeking to further ocean fishery or sea nomadism. 
Figure 6 presents a possible scenario for these migrations with approximate chronology. 


Figure 6: Sidwell (2020:26, map 3). Speculative model of AA dispersal with maximal maritime component, 


For more than a century, we have ignored the richest ecosystems, and the fastest modes of travel in 
ancient times, when considering the problem of AA dispersal and homeland. Our thinking has been 
conditioned by images of gradual overland and downriver movements of hunters, dry rice farmers, and 
vegeculturalists. I propose that to some extent, we were projecting into the past our observations of the 
present and especially of those AA groups that today have under-developed economy and relatively 
little mobility, suggesting an unrepresentative image of historical character. Yet we are now beginning 
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to recognize that early rice farmers were sometimes capable coastal navigators. Higham (2021:26) 
discussing the introduction of rice to Indo-China states bluntly, ‘It is beyond reasonable doubt that there 
was a rapid coastal colonization by rice farmers.’ Should we assume a sudden abandonment of such a 
strategy once our early colonists had lit upon the fertile land of the RRD? I suggest that it is more likely 
that our colonists remained oriented to the coast for all sorts of practical reasons, but it just happens to 
be a serious challenge for archaeology to investigate ancient coastlines, such that sites further inland 
provide more stable conditions likely to favor preservation and discovery. It is also an issue that 
archaeology favors later eras with their greater production and diversity of artifacts. As Hung et. al. 
have expressed so clearly: 


In many ways, the conspicuous archaeological record of the Iron Age has distracted our attention 
away from the likelihood of older cultural links across the South China Sea. In fact, the Iron Age 
connections very likely followed much older sealanes and trade-routes, ... 

(Hung et al. 2013:400) 
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Abstract 

The Austro-Tai hypothesis proposes that Kra-Dai and Austronesian are descended from a 
common ancestor, Proto-Austro-Tai, which existed somewhere in Southern China several 
thousand years ago. In this paper, a newly created list of Kra-Dai and Austronesian shared 
lexical items is presented, along with already-proposed shared lexical items from earlier 
works resulting in a list of some 71 shared lexical items which may be ultimately inherited 
from a putative Proto-Austro-Tai language. The list is also used to analyze 
correspondences between Kra-Dai and Austronesian vowels in final syllables. It is shown 
that several regular correspondences are shared by the two families and that these regular 
correspondences most likely date back to a shared common ancestor. 


Keywords: Austro-Tai, Austronesian, Kra-Dai, Historical linguistics 

ISO 639-3 codes: ami, bhp, bnn, bny, bth, btx, bzg, ceb, ckv, doc, enc, iba, ilo, ind, ivv, 
kmce, kys, kzi, kzp, laq, laq, Ibc, lic, mak, mkg, msa, nij, nut, onb, pec, pni, sas, skb, smr, 
sne, ssf, tao, tay, tgl, tha, tou, trv, tys, tyz, wun, xnb, yha, yln, yrn, yzg, zbc, zha, zyn 


1 Introduction! 

The Austro-Tai hypothesis is a macro-family proposal which posits that Austronesian and Kra-Dai’, 
two major language families of Southeast Asia and the Pacific, are descended from a common ancestor, 
henceforth referred to as Proto-Austro-Tai (hereafter, PAT)*. The proposal itself is not new — the first 
major publication on the topic, that of Benedict (1942), is nearly 80 years old, and the first ever mention 
of AN-KD relations (Schmidt 1906) is even older. In addition to linguistic similarities, recent genetic 
studies also suggest a link between AN and KD populations (Li et al. 2008). The AT proposal itself, 
however, has not gained wider acceptance due to issues both in the comparisons found in Benedict’s 


1 


Data Sources and Abbreviations are as follows: Proto-Austronesian (PAN) and Proto-Malayo-Polynesian 
(PMP), are from Blust and Trussel (ongoing), Proto-Kra (PK), Buyang, Qabiao, Pubiao are from Ostapirat 
(2000), Paha is from Li and Lou (2010). Note that Li and Lou’s 2010 publication is a grammar of “Buyang”, 
which is used as a cover term that includes several dialects. In this case, Li and Lou’s “Buyang” is the Paha 
variety. Proto-Ong Be (POB) and Ong Be are from Chen (2018). Proto-Hlai (PH) and Hlai are from Ostapirat 
(2004). Lakkja is from Fan (2019). Southern and Northern Kam (S. Kam, N. Kam) are from Long et al. (1998). 
Proto-Tai (PT), and all lexical data of Tai languages, including Siamese, Sapa, Bao Yen, Cao Bang, Lungchow, 
Sangsi, Yay, and Saek are from Pittayaporn (2009). Additional abbreviations include AT (Austro-Tai), AN 
(Austronesian), and MP (Malayo-Polynesian). 

Kra-Dai is often referred to as Tai-Kadai. 

A modification to the original AT proposal is given in Sagart (2004, 2005), where he states that Kra-Dai is a 
daughter, rather than a sister, of AN. Sagart’s proposal places Kra-Dai as a sister to Malayo-Polynesian with a 
Formosan ancestor which he dubs FATK, or “Formosan Ancestor to Tai-Kadai”. For the purposes of this 
paper, it is assumed that any relationship between KD and ANis at most a sister relationship, with an ancestor, 
PAT, giving way to two daughters, PKD and PAN. 
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original publication as well as sometimes serious methodological issues in his follow-up manuscripts, 
namely, Benedict (1975 and 1990).* 

More recent research in AT, especially that in Ostapirat 2005, 2013, and 2018, has given new life 
to the AT hypothesis with methodologically strict and consistent research on a core set of shared 
vocabulary between AN and KD while keeping a safe distance from Benedict’s less methodologically 
sound post-1942 publications. Therefore, the present research looks to Benedict’s original 1942 
publication and Ostapirat’s more recent research when evaluating past AT proposals and does not 
include comparisons from Benedict’s later works on the subject. 

The present research is also meant to expand the core set of AT lexical and phonological 
comparisons with newly available datasets from KD. Through analysis of new datasets and application 
of the comparative method, this research is able to expand the number of comparisons to over 70, with 
much of the evidence coming from the Tai branch. The remainder of this paper is organized as follows: 
Section 2 gives background information on the AT Hypothesis itself and organizes a list of possible 
shared lexemes from various publications. Section 3 offers a list of new shared lexemes which are then 
added to the existing list. Section 4 discusses vowel correspondences between KD and AN. Section 5 
concludes. Something to keep in mind when reading the paper is that KD and AN have differing 
orthographic conventions. AN y is equivalent to KD j, both indicating IPA [j]. AN @ is equivalent to KD 
syn. AN *z is equivalent to IPA [dg], and AN *j is typically considered to be equivalent to IPA [g!]. In this 
study the different orthographical practices sometimes result in comparisons between AN -y and KD - 


j. 


2 The Austro-Tai hypothesis 
As already stated, the AT hypothesis is not new, but AT itselfis not typically considered a “mainstream” 
language family. Before getting into the specifics of the proposal, some background on AN and KD is 
necessary. AN and KD occupy mostly nonoverlapping territories in Southeast Asia. AN languages are 
spoken in Taiwan, The Philippines, Malaysia, Indonesia, New Guinea, Madagascar, and throughout the 
Oceanic region. KD languages are spoken in Southern China, Laos, Thailand, parts of Myanmar, 
Vietnam, and Cambodia. The divide between AN and KD roughly follows the boundary between 
Mainland and Island Southeast Asia, although this division is fuzzy and exceptions are found. 
Member languages in the two families tend to be typological opposites in many respects. In terms 
of phonological typology, conservative AN languages tend to have (C)V(C) syllables, four to six 
vowels, disyllabic canonical words, and are nontonal. As for syntactic features, conservative AN 
languages exhibit complex verbal morphology, ubiquitously AN voice systems, and verb-initial word 
orders. KD languages, in terms of phonological typology, have more complex syllable shapes such as 
(C)CV(:)(C), have larger vowel inventories including a higher prevalence of diphthongs, have 
canonically monosyllabic or sesquisyllabic words, and are tonal. Regarding morphosyntax, KD 
languages have little affixational morphology and a relatively strict SVO word order. A potentially 
interesting outcome of pursuing a research program with the AT hypothesis is learning both the nature 
of the proto-language itself as well as the mechanics of syntactic and phonological change that result in 
such opposite typologies. 


In these later studies, Benedict attempts to strengthen the AT hypothesis via crude means. For example, he 
expands the family to include Miao-Yao in 1975, and further to include Japonic in 1990. Both of these 
inclusions allow for almost unlimited sources for his search for similar words, but the inclusion of these two 
languages is not widely supported. Another issue is the direct comparison of KD lexemes with both individual 
languages in AN and with late-stage proto-forms that contain innovated phonemes not present in PAN. An 
example of the latter is his direct comparison of Proto-Oceanic labio-velars with KD lexemes even though 
such phonemes were Proto-Oceanic innovations (Reid 1984-1985). Finally, Benedict expanded his 
comparison list by reconstructing complex and sometimes unnatural sequences of consonants to PAT with 
liberal use of parentheses and square brackets. Blust (2014) uses to term “proto-form stuffing” in his criticism 
of Benedict’s 1975 comparison between AN ‘forehead’ and KD ‘face’. To quote Blust (2014:310), “In order 
to relate what [Benedict] called “Indonesian” *[dd]a?ay=[dd]a?ay ‘forehead’ and related forms in Formosan 
languages such as Thao shaqish, Bunun daqis ‘face’ to Proto-Tai *hna ‘face’, he posited Proto-Austro-Tai 
*(q/)(n)dza[q]ai[s] ‘face, forehead’.” 
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Regarding the nature of the relationship between AN and KD, the proposals below assume that 
shared vocabulary is the product of inheritance from a distant common ancestor. Although it has been 
argued that shared vocabulary is not inherited (Thurgood 1994), this study considers the evidence 
sufficient for a genetic relationship hypothesis. Evidence in the form of shared vocabulary contains two 
important features, namely, the tendency for shared vocabulary to be “basic” and the regularity of sound 
correspondences between suspected cognates. The vocabulary listed in this and previous studies on the 
AT hypothesis overwhelmingly comes from basic vocabulary. Ostapirat (2013) makes a similar 
observation and notes that Chinese vocabulary in KD tends to be non-basic vocabulary whereas AN 
vocabulary tends to be basic. Regarding sound correspondences, Ostapirat (2005) laid a solid 
foundation of regular sound correspondences between AN and KD in both consonants and tone. 
Regularity of this kind is typically viewed as evidence for inheritance. Additionally, as will be discussed 
more later in Section 4, there is regularity in vowel correspondences, which adds additional evidence 
in support of the hypothesis that these lexemes are inherited from a common ancestor. 

The current research is mostly dedicated to identifying and examining potential cognates between 
the two families. The current inventory of AN/KD potential cognates are from several sources stretching 
over multiple decades. Because of this, an attempt is made to organize a more comprehensive list before 
moving on to the original portion of this paper. The review begins with Benedict (1942), who gives a 
large list of potential cognates and later includes data sets from Ostapirat (2005, 2013). 


2.1 Benedict’s comparisons 

A list of comparisons from Benedict 1942 is given in Table 1, with the PAN reconstructions according 
to the original sources and modified wherever necessary to reflect current PAN orthographical practices. 
Benedict’s reconstructions were based on Dempwolff (1937), and therefore need much modification to 
make them compatible with modern PAN orthography. Benedict also compared some KD words directly 
with Indonesian, not with a reconstructed AN proto-language. In these cases, the Indonesian words are 
listed in italics. 

Not all of the words on this list hold up to close scrutiny. Some, particularly those which are found 
only in Indonesian and not in any AN proto-language, should be excluded from any list of potential AT 
vocabulary. In this study, the acceptability of cognates are judged on three conditions: (1) Can the 
comparisons be reconstructed to at least one primary-level protolanguage in both AN and KD?; (2) 
Based on our current understanding of sound correspondences, are the comparisons regular?; and (3) 
Are the proper syllables being compared between the two groups (specifically, do the KD 
monosyllables correspond to AN final-syllables)? Based on these criteria, the words for the numerals 
‘one’ through ‘ten’, ‘water’, ‘cry/weep’, ‘star/sun’, ‘eat’, ‘raw’, ‘this’, ‘bird’, ‘nose’, ‘grandfather’, 
‘fire’, ‘head’, ‘tooth’, ‘die’, ‘rain’, ‘eye’, ‘black’, ‘moon’, ‘fart’, and ‘sour’ appear to pass scrutiny. The 
others do not, and an explanation for why they are ruled out as valid comparisons are given in the 
remainder of section 2.1. 
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Table 1: Potential AT etyma from Benedict 1942 with glosses, Proto-An, and Malay words 


Gloss An lexeme Gloss An lexeme 
bird *manuk hair rambut 
black *domdom head *qulu 
blind *buCa lungs *pusuq 
blood *daRaq man *Cau 
boat perahu moon *bulaN 
body daging nest sarang 
bone tanduk night *Rabiqi 
breast *susu nine *Siwa 
cry/weep *Canis nose *ijun 
day/sun *waRi one *isa 
die *m-atay rain *Rabun/*ambun 
door pintu raw *qudip 
ear *Calina rice field *bona 
eat *kaon seven *pitu 
eight *walu SIX *onom 
eye *maCa small *qitik 
fart *qgotut sour *qasom 
fat/oil minyak star/sun *qajaw 
father bapak ten *sa-puluq 
fire *Sapuy this *ni 
five *lima three *tolu 
flower *buna tooth *[n/n/lJipan 
foot *qaqay two *duSa 
four *Soapat water *danum 
grandfather | *osmpu (PMP) yellow kuning 


2.1.1 Kra-Dai words compared with Austronesian penultimate syllables 

A major difference between AN and KD is canonical word shape. AN languages are mostly disyllabic, 
a canonical word shape that can be reconstructed to PAN. KD languages are mostly monosyllabic or 
sesquisyllabic, although Ostapirat (2018) argues that PKD can be reconstructed with fully disyllabic 
words with internal evidence despite a total lack of disyllabic canonical words in modern KD languages. 
The shift to monosyllabicity in KD appears to be the natural consequence of ultimate-syllable stress, 
which results in the reduction and eventual deletion of the unstressed penultimate syllable but no 
reduction or deletion in final syllables. The result of this reduction is that monosyllabic words in KD 
correspond to the ultima in PAN, never the penult. Benedict, however, tended to compare KD 
monosyllables to both penultimate and ultimate syllables in AN at his own convenience. This practice 
allowed him to make many additional comparisons, but since the nature of penultimate syllable 
reduction as a consequence of stress placement on the ultima is well understood, the validity of 
comparisons between KD monosyllables and AN penultimate syllables is questionable. Therefore, only 
comparisons that match KD monosyllables with AN final syllables are considered valid. The following 
PAN reconstructions in example 1, which Benedict connects to KD vocabulary, are considered invalid 
because Benedict compared their penultimate syllable to his suspected KD cognates. 
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1 *Calina ‘ear’ *waRi ‘day/sun’ 
*buna ‘flower’ *qaqay ‘foot’ 
*buCa ‘blind’ *Rabiqi ‘night’ 


*pusugq ‘lung’ 


2.1.2 Words that are restricted to Indonesian and closely related languages 

Other words can be discarded because they involve comparisons of KD words directly with Indonesian. 
This set of words is listed in example 2. In these cases, the Indonesian words cannot be reconstructed 
to a higher-level proto-language (PMP or PAN): 


2 bapak ‘father’ kuning ‘yellow’ 
perahu ‘boat’ rambut ‘hair’ 
minyak ‘fat/oil’ pintu ‘door’ 
sarang “nest” tanduk ‘horn’ (listed as bone in Benedict 1942) 


2.1.3 Words with additional issues in correspondences, semantics, and attestation 
Three additional words, *daRaq ‘blood’, *qitik ‘small’, and *bana ‘rice field, are problematic due to 
their correspondences, semantics, and attestations. Details are shown in example 3. 


3 *daRaq ‘blood”’ - This comparison assumes a change of *q > tin KD, for example, PT *luot 
and or Lakkja /ie:t’’, which is irregular (*q typically merges with *k in this position). 
*qitik ‘small’ - Benedict’s comparison is based only on Lati, but the Lati word could not be 
verified. Even so, comparison with a single language violates the reconstructability 
requirement. 
*bana ‘rice field’ - This word compares KD with AN words meaning ‘river mouth’ or ‘lower 
part of river/tidal bore’. The semantics are considered too different to reconcile. 


2.2 Ostapirat (2005 and 2013) 

Other than Benedict, Ostapirat has contributed much to the field of AT studies and has introduced many 
additional suspected cognates. A combined list from Ostapirat 2005 and 2013 is given in this section, 
with the PAN reconstructions according to the original sources and modified wherever necessary to 
reflect current PAN orthographical practices. This is a simple list of all words that Ostapirat includes in 
his studies, and therefore several overlaps exist between this list and the list in Table 2. 


5 This word also compares the word ‘lung’ to PAN *pusuq ‘heart’ with dubious semantics. 
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Table 2: Suspected AT etyma from Ostapirat 2005 and 2013 with gloss and PAN reconstruction 


Gloss PAn Gloss PAn 
bear *Cumay moon *bulaN 
bird *manuk navel *pudoR/puja 

bitter/pungent *paqiC net *aray 
black *tidom nine *Siwa 
boat *aluja nose *iTyuy 
borrow/lend *Sozam one *isa 
centipede *qali-Sipan otter *Sonaq 
chaff/bran *qopah plant *mula 
child *aNak rain *quzan 
clam/snail *(kuSul) raw/alive *qudip 
cry/weep *Canis saliva *najay 
die *m-aCay sesame *lona (PMP) 
eat *kaon shoulder *qabaRa 
eight *walu shrimp *qudan 
excrement *Caqi/*Caki skin/scale *kuliC 
eye *maCa sour *qasom 
fart *qotut stream | *qaRus (PMP ‘current’) 
fat/oil/grease *SimaR sun/star *qajaw 
fire *Sapuy taro *biRaq 
flow/current *qaluR ten *sa-puluq 
grandmother *aya this *nl 
grandparent | *ompu (PMP) tongue *Soma 
hand *qalima tooth *Tn/n/I]ipon 
head *qulu two *duSa 
I *aku water *daNum 
leg/thigh *paqa ‘thigh’ you *Simu 
louse *kuCu 


2.2.1 Questionable comparisons 
Ostapirat (2005) additionally includes the following comparisons in example 4 which are found in 
Atayalic, an AN primary branch, but which are not reconstructed to PAN. 


4 yawn Proto-Atayal *surab 
mouse Proto-Atayal *qawlid 
leaf Proto-Atayal *?abag 


These three comparisons are all used to demonstrate word-final voiced stop correspondences between 
KD and AN. There are two issues with these comparisons. First, the words themselves cannot be 
reconstructed to PAN with internal evidence. Although the presence of KD cognates theoretically allows 
for reconstruction without additional AN evidence under the AT hypothesis, the words in question 
compete with much more robustly attested PAN reconstructions *Suab ‘to yawn’, *labaw ‘mouse’, and 
*waSaw ‘leaf’, each with evidence from multiple AN primary branches. The second issue is 
methodological. Many proposed KD-AN cognates and sound correspondences are evidenced by 
multiple witnesses. The proposals in 4, however, are not corroborated with any second witnesses. The 
correspondence between PAN *-b and suspected KD cognates, for example, relies on this single lexeme. 
Considering the confinement of AN examples of potential word-final voiced stops to Atayal, the 
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competition between Proto-Atayal reconstructions and more robustly attested PAN reconstructions, and 
the lack of second witnesses for the proposed word-final correspondences, these words are left off of 
the list. Note that Ostapirat (2005:120) also points out that these comparisons are tentative. 

An additional questionable comparison is *aray ‘net’, which Ostapirat reconstructs based on the 
comparison between PAN *aray ‘net’, Tai hee, Kam-Sui re, and Hlai ra.j. The source of the PAN 
reconstruction is unknown, however, so it’s difficult to analyze the potential relationship between these 
words. It is therefore left off of the list unless it can later be verified with an AN source. 


2.3 Listing established comparisons 

Next, the various comparisons from Benedict (1942) and Ostapirat (2005; 2013) are combined into a 
single list. The resulting list is a moderately sized table (shown in Table 3) of likely shared lexemes 
between AN and KD numbering 52. Reconstructed forms reflect their AN reconstructions. 


Table 3: Combined list of KD-AN etyma from Benedict (1942) and Ostapirat (2005; 2013) with 
glosses and PAN reconstructions 


Gloss PAN Gloss paN Gloss PAN Gloss PAN 

bird *manuk fart *qgotut moon *bulaN sour *qalosom 

alive; raw *qudip fire *Sapuy | | navel *puja sun; star *qajaw 

bear *Cumay five *lima nine *siwa ae *paqa 
Tian 

bitter *paqiC flow *qaluR | | nose a 7 this *i-ni 

: * 

plack: one four *Sopat | | one *isa three *talu 

dark m 

boat *aluja eat *ompu | | otter *Sanaq | | to cry *Canis 

pes 
chaff *qopah grease oe saliva *qajay to die *m-aCay 
ae 

child *aNak hand oo sesame *lona to plant *mula 

current *qaRus head *qulu seven *pitu tongue *Soma 
ok * 1 

eat *kaon I *aku shoulder a tooth ag 

eight *walu lend *Sozam | | shrimp *qudan two *duSa 

excrement *Caqi louse *kuCu SIX *onom water *daNum 

eye *maCa mist *Rabun ia *kuliC your *kamu 

scale 


2.3.1 Notes on irregular comparisons 

In addition to the 52 words in Table 3, there are six additional words which are frequently cited as being 
unproblematic AT vocabulary, but which nevertheless have some irregularities which should be 
mentioned. These six additional comparisons are shown in Table 4 with a more detailed discussion of 
each lexeme afterward. Inclusion in this list does not imply that these comparisons should be removed, 
but simply that there are some irregularities which may be explained better in future work. 
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Table 4: Proposed AT etyma from Ostapirat (2005) with irregularities 


Gloss PAN 
clam; snail *kuSul/(*-kul)‘clam; snail’ 


fat; grease; oil | *SimaR ‘fat; grease; oil’ 


grandmother | *aya ‘grandmother’ 


rain *quzan ‘rain’ 
taro *biRaq ‘taro’ 
ten *puluq ‘ten’ 


*pulugq ‘ten’ - The first irregular comparison involves *puluq ‘ten’. Most basic KD and AN numerals 
are straightforwardly connected. For example, there is good evidence that the numerals ‘one’ to 
‘nine’ are shared by KD and AN. In Buyang, for example, the numerals ‘two’ to ‘nine’ are clearly 
related to their PAN counterparts, as shown in example 5. 


5 Buyang PAN 
ca’! ‘two’ *duSa 
tu“! ‘three’ *talu 
pa”! ‘four’ *Sopat 
ma” ‘five’ *lima 
nam! ‘six’ *onom 
tu’? ‘seven’ *pitu 
du’? ‘eight’ *walu 
va"! ‘nine’ *Siwa 


The word for the numeral ‘ten’, however, differs in the presence of a seemingly irregular reflex of *q. 
Comparisons of ‘ten’ include the following in example 6. 


6 PAN *pulug PH *apu:© Paha vat?!/ pwat”! Buyang put”! 


The assumed history of this word is as follows: First, the penultimate vowel deletes, causing an 
intermediary *pl- cluster in PKD daughter languages. This simplified to *p as in Paha var”! and Buyang 
put”'. However, there are still issues with this comparison. PAN *-q corresponds to -k in KD: Paha 
naak"' ‘otter’ : PAN *Sanagq ‘river otter’ and Paha taak*’ ‘vomit’ : PAN *utaq ‘vomit’. Ostapirat (2005) 
postulates that the word-final sequence *-uq undergoes a fronting process: *-uq > *-uiq > *-uC (-ut in 
Kra). This condition is unexpected, since word-final [q] tends to have a lowering and backing effect on 
preceding vowels, not a fronting effect. Another possible comparison with *-uq, PAN *-tuq ‘to fall’ : 
PH *?tuk ‘to fall’, Paha tok’, Buyang tuk’, S. Kam tok*’ may indicate that *q from *-ug merged with 
*-k just as it did after other vowels. If this is true, then word-final ¢ in this comparison diminishes its 
strength. Sagart (2010) proposes that -t is from a linker which is now fused on the KD root, although 
the Austronesian equivalent is only found in Philippine languages. 


*aya ‘grandmother’ - This word is reflected as ‘grandmother’ in KD, but probably referred to one’s 
paternal aunt/paternal aunt’s husband in PAN. This reconstruction holds for MP, but Formosan 
reflexes mean either ‘mother’ (Taokas) or ‘mother; mother’s sister’ (Atayal). At any rate, it is 
possible that apparent comparisons between KD ‘grandmother’ and AN ‘paternal aunt/paternal 
aunt’s husband’ underwent semantic differentiation. 


*quzan ‘rain’ - Only one comparison is shown in KD, but this comparison has an / reflex of word-final 
*-N, which is considered a result of the short vowel reflex in Laha Gal. There are not many words 
that can be used to test if this hypothesis is true. The only other reflex of a word-final -N is from 
*bulaN, where it is reflected with -n and the vowel is long. Medial *z does not provide additional 
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insight, since there is no overlap between reflexes of *sozam ‘to borrow’ and *quzan ‘rain’. 
Additionally, *z may be affected by the preceding *u, making direct comparison between reflexes 
of *z from *sozam and *z from *quzan difficult. 


*biRaq ‘taro’ - Reflexes of *biRaq in KD pose issues because of the irregular devoicing of word-initial 
*b in PT *prumok and Paha pwaak"’. It is possible, however, that devoicing occurs due to the 
intermediate *br- cluster from PKD. There is not enough data to tell if this is regular, however. 


*kuSul ‘clam; snail’ - Ostapirat reconstructs *kuSul from PMP *kuhul with additional KD evidence: 
Tai ha2i, Kam-Sui khuj, Hlai tshei, Kra ci. He points to the typical *S > *h sound change which 
took place between PAN and PMP, suggesting that although there is no Formosan evidence that 
the PAN reconstruction can be validated with KD evidence. This comparison may be correct, 
although there is no other corroborating evidence for the | : -j correspondence between AN and 
KD. If *-1 merged with *-R, then the change may have followed suit. 


*SimaR ‘grease; fat; oil’ - Two issues are found with this comparison. First, Tai reflexes of words 
with high vowels in the penultimate syllable and the low vowel in the final syllable typically have 
high-vowel reflexes (*a > *w1): PAN *bulaN : PT *6luron“, PAN *tubah : PT *C.bura“, PAN “puja 
: PT *dwut:*. The low-vowel reflex, Tai man, is therefore unexpected. Second, there is no other 
corroborating evidence to suggest that *-R regularly became KD -n. The only other comparison 
with word-final *R is from *qaluR, where *-R became -j in KD (PT **lwaj‘). 


3 New comparisons 

In the following section, additional suspected cognates are presented which are the result of recent 
research into the relationship between KD and AN. The goal of listing these comparisons is to expand 
the list of KD-AN cognates, which should allow for more accurate comparisons and descriptions of 
sound correspondences. Since there are relatively few suspected cognates at the current stage of AT 
research, the most important task at this stage is to try and expand the list as much as possible while 
maintaining a strict adherence to the comparative method. 

The suspected cognates are listed in table 5 as individual AT etyma with the current PAN 
reconstructed forms and updated definitions based on both AN and KD. After Table 5, each suspected 
AT lexeme is presented with examples from both AN and KD and further discussions and explanations 
where necessary. 


Table 5: New Austro-Tai etyma 


Gloss PAN Gloss PAN Gloss PAN 
afraid; timid; fear *talaw uncertainty marker | “nu to come; arrive | *daton 
derris root; fish poison| *“tubah rattan *quay | |to fall *-tug 
fish hook *kabit shadow *qaNinu | |to sell *saliw ‘to 
hold in the fist *kamkom| [sick *sakit | |to transplant *Canom 
leech *moCak | [spotted *bolan 


*talaw ‘afraid, timid, fearful’ 

In AN, there are two reconstructions with overlapping meaning: *takut ‘fear’ and *talaw ‘timid; fearful; 
coward’. The first tends to refer to the feeling of fear, whereas the second is a description of a person’s 
actions or characteristics. Reflexes in KD tend to simply refer to the state of being afraid. There may 
not have been a strict distinction between ‘afraid’ and ‘coward’ in AT with both being attributed to the 
word *talaw. 


ANexamples: PAN *talaw ‘timid; fearful; coward’, Amis talaw ‘to be afraid’, Itbayaten taxaw 
‘cowardliness’, Cebuano talaw ‘to back off; be afraid to do s.t.’, Singai taru ‘to be 


afraid’, Aoheng tao ‘to be afraid’. 
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KD examples: PT *la:w4 ‘afraid’, Bao Yen /a:w*! ‘afraid’, Qabiao /a:w”*, Mak Ju!, S. Kam jaw! 


*quay ‘rattan’ 
Reflexes of *quay are restricted to Tai in KD, but this may be a product of sampling, since many 
resources do not include ‘rattan’ in basic vocabulary. 


ANexamples: Pazeh Puay, Amis ?oway, Cebuano uwdy, Singai ui, Aoheng ui 
KD examples: PT *C.wa:j* ‘rattan’, Siamese wa.j*', Sapa va.j*!, Lungchow va:j*', Yay va:j*! 


*Canom ‘to transplant a crop; plant; burry’ 

KD evidence allows for a distinction between *mula ‘to plant’ and *Canom ‘to transplant’. In AN, the 
additional meaning of ‘to bury’ is applied to *Canom, but this appears to be a semantic extension not 
present in KD. 


ANexamples: Amis tadam ‘grave; tomb’, Cebuano tanim ‘to plant’, Malay tanam, Chamorro tanom 
KDexamples: PT *t.nam* ‘to transplant’, Siamese dam“', Lungchaw dam*', Saek tram*', Paha tam“! 
‘to plant’, Buyang 2dam”*. 


*tubah ‘derris root; fish poison made from the derris root’ 

This comparison involves semantics which may not be obviously linked. The derris root is commonly 
ground up into a substance which is used to poison fish in a contained body of water, traditionally used 
to gather large quantities of fish with little effort. The use of the term as ‘fish poison’ in Tai, but as the 
name for the plant in PAN, is not unexpected. Like *quay, reflexes of *tubah are restricted to Tai. This 
is also most likely a product of sampling, since most resources do not include entries for ‘derris root’ 
or ‘fish poison’. 


ANexamples: Saisiyat ta-toba? ‘fish poison’, Pazeh ta-tuba ‘derris poison’, Itbayaten tova ‘plant used 
for fish poison’, Cebuano tiba ‘kind of croton plant’, Malay tuba ‘derris root’ 
KD examples: PT *C.buto“ ‘fish poison’, Siamese bwa“!, Sapa bwA!, Lungchow bu:“!, Saek via"! 


*komkom ‘fist; hold in the fist’ 

AN examples: Itbayaten kamkam ‘handful’, Cebuano kumkum ‘hold something in the hand’, Simular 
xankam ‘a closed handful’, Kaidipang koygomo ‘to hold in the fist’ 

KD examples: PT *kam‘4, Siamese kam*!, Sapa kam*!, Lakkja kam’, Qubiao kam?” 


*datay ‘to arrive; reach a place’ 

ANexamples: Itbayaten ratay ‘arrival’, Ilokano dataiy ‘arrival’, Malay datay ‘to come’, Sasak datayn 
‘to come; arrive’ 

KD examples: PT *C.tyn‘ ‘arrive’, Siamese ¢/wuy“!, Sapa t'wy*', Lungchow t'vy"', Saek tiay*”, POB 
*dfon!, Lakkja tay’*! ‘come’, S. Kam tey”’ ‘come’ 


*qaNinu ‘shadow’ 
ANexamples: Kavalan niyu, Bunun ganiyu, Itbayaten anino, Cebuano aninu, Bimanese ninu 
KD examples: PT *naw4, Siamese yaw**, Bao Yen yrw*?, Yay yaw*?, PH aya:u‘, Paha yau” 


*kabit ‘fishing hook’ 

ANexamples: Pazeh kabit, Amis, kafit, Tagalog kabit, Iban kabit 

KD examples: PT *6et? ‘fish hook’, Siamese bet?S!, Cao Bang bet?S!, Sapa bit?S!, Bao Yen brt?*!, 
Lungchow bit?S! 


*maCaq ‘paddy leech’ 
ANexamples: Amis fakintaq, Kanakanabu nimaca?a, Ilokano alintd, Malay lintah, Singhi rimotah 
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KD examples: PT *da:k? ‘leech’, Siamese t*a:k>'*, Bao Yen ta:k?!?, Yay ta:k?!?, POB *da:k?!, PH 
*?ta:k, Paha na*!taak*’, Lakkja la:k>'? (compare *maCay — Lakkja ple/) 


*saliw ‘to sell’ 
ANexamples: Amis caliw, Ivatan mapasaliw, Ilokano saliw, Proto-Sangiric saliu 
KD examples: PH *aRi:u°, PK *s-ywi 


*nu ‘marker of uncertainty’, *i-nu ‘where’, *a-nu ‘who’ 

ANexamples: Seediq ma-nu ‘what; which’, Thao mi-ni ‘why’, Itbayaten di-no-h ‘where’, Kayan hi- 
no? ‘where’ 

KD examples: Paha nau ‘who’, Mulam nau? ‘who’, N. Kam nau ‘who’ / 20”nau”* ‘where’, Qabiao 
njau® ‘who’ 


*-tuq ‘fall’ (a monosyllabic root) 

This is the only comparison here which includes a monosyllabic root. In AN studies, monosyllabic roots 
are typically -CVC segments that appear at the end of a word and are similar to phonaesthemes in that 
they are not lexical but seem to have recurring meanings where they occur (Brandstetter 1916; Blust 
1988). The AN examples therefore come from different lexemes which utilize this root but do not 
necessarily descend from a single PAN source. The existence of monosyllabic roots in AN with 
comparisons in KD may indicate the presence of monosyllabic words in PAT which were modified in 
AN to conform to a disyllabic requirement, creating irregularities in pre-final syllables but a common 
monosyllabic word-final root. 


ANexamples: Bintulu gatu?, Malay rentoh, Kelabit tutu?, Berawan sito 
KD examples: PH *?tuk ‘fall’, PK *tok? ‘fall’, Buyang tuk! ‘fall’, S. Kam tok°> ‘fall’ 


*sakit ‘sick; to hurt; be in pain’ 

Both *sakit and *balan are restricted to MP in AN, which poses an issue for these comparisons. 
However, unlike Proto-Atayal *surab ‘yawn’, *qawlid ‘mouse’, *?abag ‘leaf’, which compete with 
more robustly attested PAN reconstructions, *sakit ‘sick’ does not have a PAN word with which it 
competes. This may be due to the loss of reflexes of a putative PAN *sakit in Formosan languages but 
its retention in MP and KD. *belan, however, competes with PAN *paCak ‘spotted’ and is therefore 
more problematic, although the sound correspondences between PT *6la:n and PMP *bolan are regular. 


ANexamples: Ilokano sakit, Tagalog sakit, Malay sakit, Karo Batak sakit 
KD examples: PT *ke:t? ‘to hurt’, Shangsi ket?S!, Yay cet?S!, Saek ke.1?S!, Lakkja we.” ‘to hurt, 
ache’, N. Kam kit*? 


*bolan ‘spotted’ 
ANexamples: Ngaju Dayak balay, Malay balay, Sasak balay, Makassarese ballay 
KD examples: PT *6la:n® ‘spotted’, Siamese da.y®', Sapa ba:y®', Bao Yen bja:n®!, Cao Bang da:y?! 


With the additional comparisons listed above, a list of 71 comparisons may be presented in Table 
6. The list will undoubtedly change as additional comparative research is conducted on the two families. 
However, the number of apparently valid PAT lexemes with reflexes in both KD and AN continues to 
increase and the number is likely to go up, rather than down. 
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Table 6: Combined list of AT etyma from Benedict, Ostapirat, and the present research with gloss and 
PAN reconstruction 


PAN 
0) 


ne *isa 


Gloss 
afraid 


PAN 
*talaw 
*aku 


*qudip 


jet) 
c 


fone 
*Sanaq 
*quzan 
“quay 
“ajay 
pita 
=aaNina 
=gabaRa 
*quday 
skit 
Fonm 
“lic 
Fgalasam 
“bola 
“aja 
*biRaq 
Spulug 
“page 
“i 
“al 
"data 
‘Capi 
FivaCay 
“kaon 
tug 
atu 
“mula 
ali 
‘Canam 


*Soma 
tooth *[n/n/l]ipon 
two *duSa 


uncertainty marker *nu 


you *kamu 


alive; raw 


io” 


ear 
bird 

bitter 
black; dark 
boat 

chaff 

child 


clam; snail 


*Cumay 
*manuk 
*paqiC 
*domdom 
*aluja 
*qopah 
*aNak 
*kuSul 
*qaRus 
*tubah 
*walu 


current 
derris root 
eight 
excrement 
eye 

fire 

fish hook 
five *lima 
flow a 


four 


*Soapat 
*daNum 


*aya 


fresh water 
grandmother 
grandparent 
*SimaR 


*qalima 


greese; fat 
hand 
head *qulu 
leech *moCak 
lend *Sozam 
louse *kuCu 


mist; cloud *Rabun 


* x] *] *] x 
al n 

: B/E/5 |2 12/0 

as) = Sg. Q}a 

= v2) aE | os [= 


4 Notes on the vowel correspondences 

With the establishment of a discrete list of AT comparisons, the discussion may now turn to the issue 
of correspondences between AN and KD. Since Ostapirat (2005) has done much work on the 
correspondence between consonants, the following section will mostly focus on correspondences 
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between the vowels. Importantly, it is shown that there are regular reflexes of final-syllable vowels 
which appear to be conditioned by both the quality of the preceding vowel (assuming that PAT was 
disyllabic, see Ostapirat 2018), and by the coda. 


4.1 The low-vowel *a 

Although Ostapirat (2005) focused on consonant correspondences between AN and KD, he does make 
a specific statement on vowel correspondences in Ostapirat (2013). In that study, it was suggested that 
PKD distinguished between *a: and *y¥:, and that these vowels both correspond to PAN *a. Under this 
hypothesis *a: and *y: merged in PAN but remained distinct in KD. In Tai, however, differential reflexes 
of *a appear to be conditioned by the height of the preceding penultimate vowel. Where *a is preceded 
by a high vowel, it has a high-vowel reflex in Tai, either *t or *tuo. Where *a is preceded by a non- 
high vowel, it has a low-vowel reflex in Tai, *a:. Examples are organized in Tables 7 and 8. Table 7 
shows high-vowel reflexes in Tai, and Table 8 shows low-vowel reflexes. 


Table 7: High-vowel reflexes of *a in Tai 


derris 
moon root vomit boat navel bear hand 
PAN *aluja Pere 
*bulaN *tubah *utaq (paddle) puja *Cumay | *qalima 
PT *C.rwuio* 
*bluran’ | *C.buro“ | *rwurak? (boat) *dwoo:’ | **mwuj* | *mwar:4 
Siamese | duron*! bu"! take? ra dur:“! mi? mut’? 
Sapa burton“! bu"! ha??? hur’? dur*! mi*! mut? 
Bao Yen | buon‘! buro*! rake lua”? -i - mi:*! fn 
Cao Bang | burn‘! buro*! ra:k?? lua”? duo“! mi‘! mu? 
Lungchow | by:n4! bur:4 fa:kPl? jist - mi:“! mut? 
Shangsi bun‘! - luk?! lu’? - muj*! moj*” 
Yay duro“! - ruok?l? rua’? duo“! | mutoj"! fun’? 
Saek blion*! vio"! rugk?!? rua”! duo“! =| muraj*! | mu:”? 
Table 8: Low-vowel reflexes of *a in Tai 
otter eye die leg afraid 
PAN *Sanaq | *maCa | *m-aCay | *paga | *talaw 
PT *na:k? | *p.ta:4 | *p.taj’ | *p.qa:4 | *tla:wA 
Siamese | na:k?’? | ta:4! taj“! khg:A! - 
Sapa hae | tar taj“! xaAl - 
Bao Yen | na:k?!? | phja:“! | phajA! | kha“! | la:w4! 
Cao Bang | na:k?!? |_ tha:A! tha:jA! kha“! | law'! 
Shangsi | na:k?’? | tha:A! tha:j”! ha:“! - 
Yay make taet taj“! ka“! la:w’! 
Saek fatk? || pra pra:j*! | kwa:“! | la:w*! 


It also appears that a palatal consonant in the onset of a final syllable may trigger high-vowel reflexes 
in Tai, for example, PAN *Sozam [so'dgam] — PT **jur:m4 ‘to borrow’. There is at least one exception 
to this, PAN *aNak ‘child’ which corresponds with PT *lur:k. Other than this single exception, the 
above conditions play out in numerous comparisons. 

Outside of Tai, reflexes of *a are less consistently conditioned by the preceding vowel. Some words 
which have regularly conditioned high-vowel reflexes of *a in Tai, such as reflexes of *lima ‘five’ and 
*aluja ‘paddle; boat’ have low-vowel reflexes in Hlai, ma: and ra: respectively. However, *a is 
consistently reflected with a long vowel in subgroups outside of Tai, even where Tai has a high-vowel 
reflex: PAN *Sanaq : PH *ona:k : Paha na:k"”. 
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Following Ostapirat (2018), who reconstructs fully disyllabic PKD, I view the split of *a to *a: 
and *y: as a change that arose in parallel developments after the breakup of PKD which occurs through 
interactions between penultimate and ultimate syllable vowels. In Tai, there was a regular raising of *a 
after a penultimate high vowel. In other subgroups, raising is less regular, but this attests to the parallel 
nature of these developments. 

Another piece of evidence that the split of *a in KD subgroups is a parallel innovation is that 
outside of Tai, it is common for the penultimate vowel features to spread to the onset of the final 
syllable, if the penultimate vowel is high (Ostapirat 2018). For example, POB *zuayn : PAN *qudan 
‘shrimp’, POB *duak : PAN *utaq ‘vomit’. In these two examples, the high-back vowel *u has spread 
its features onto the final-syllable vowel and later deleted. Penultimate vowel feature transfer is apparent 
in some Tai reflexes, for example, *rwuok? ‘to vomit’, but not others, for example, *C.6u0* ‘fish 
poison’. High vowels therefore had an effect on final-syllable vowel reflexes, but these are not uniform 
across subgroups, suggesting parallel development. 

To summarize, the development of *a in final syllables is complex. In neutral environments where 
the penultimate vowel was non-high, *a consistently lengthens to *a:. Where there is a high vowel in 
the penultimate syllable, however, Tai has a regular raising of the final vowel, but other subgroups have 
a mixture of lengthening (*a > *a:) and raising/feature transfer (iCa > Cia/Ci and *uCa > Cua/Cu). It is 
therefore likely that PAT had a single low vowel *a, which first underwent lengthening in PKD *a > 
*a: and then later underwent a series of parallel splits typically conditioned by the height of preceding 
penultimate syllable vowels which co-occurred with the eventual loss of the penultimate syllable in 
most KD branches. 


4.2 Reflexes of *a 

The central vowel schwa also undergoes a split, but reflexes of schwa do not show a clear condition. In 
both KD and AN, a mid/low central vowel shares irregular reflexes with the high back vowel, *u, both 
apparently from PAT *a. To begin, PAN *a typically corresponds to PT, PH, POB, Lakkja /a/ and PK 
and Kam // (/e/ in S. Kam, /a/ in N. Kam). These are shown in Table 9 (only reflexes of the vowels are 
listed, while the words themselves are in the Appendix). 


Table 9: Typical correspondence sets involving PAT *a 
PAn | PT | PH | POB | PK | Lakkja | S Kam 


5) a - - 5) - - *tanom ‘plant’ 

a) a - - a *komkoam ‘hold in fist’ 
a) a a a ) a e *domdom ‘black’ 

5) a a - 5) a i *ipon ‘tooth’ 


It is therefore assumed PAT *a became PAN *a, PKD *a, PT *a, PH *a, POB *a, PK *a, Lakkja a, S 
Kam e. There are several examples, however, where PAN *oa corresponds unexpectedly to a back or 
central vowel. The unexpected reflexes are highlighted in Table 10. 


Table 10: Irregular correspondence sets involving pat *a and KD reflexes 


PAn | PT | PH | POB | PK | Lakkja | S Kam 
5) - u - 3 - - *onom ‘Six’ 
3 ¥ - - - u B *qasom ‘sour’ 
3 ¥ - 3 u a e *daton ‘arrive’ 


These irregularities cross over into AN as well, where there are several cases of PAN *u irregularly 
corresponding to what are typically reflexes of *a in KD. Two examples are organized in Table 11, and 
additional reflexes of the final vowel in *manuk ‘bird’ are included to demonstrate regular reflexes of 
*u in final syllables. Once again, unexpected reflexes are highlighted, assuming that these all reflect a 
PAT central vowel. 
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Table 11: Irregular correspondence sets involving PAT *a and AN reflexes 


PAn PT PH POB PK Lakkja | S Kam 
u a a a - u eB *daNum ‘water’ 
u uw - - - a) a) *Rabun ‘sky; cloud’ 
u (e) - u fe) oO fo) *manuk ‘bird’ 


One solution to this problem is to posit two separate phonemes: *a (= regular reflexes) and *w (= 
irregular reflexes). This solution remains problematic, however. If there really were two phonemes, *a 
and *t, and *ur had a tendency to irregularly merge with *o, then it is possible that there are cases 
where *ut has merged with *a in all extant languages, making accurate reconstruction impossible. 
Another possibility is that irregularities in reflexes of *a all arise from a single phoneme, which typically 
became *o but in some cases irregularly became *u. In this scenario, it is more beneficial to reconstruct 
schwa as a high-vowel like [#] or [w], which may more naturally split to both *o and *u. Irregular 
developments of high central vowels are not uncommon. The same tendency for central and high-back 
vowels to irregularly interact is apparent in Land Dayak languages. Reflexes of PMP *o are 
reconstructed as a high vowel in Proto-Land Dayak, [i] (Smith 2019), but reflexes are often irregular, 
resulting in reflexes of a, a, u, and @ in Land Dayak language Bistaang. Some examples from Bistaang 
are organized in example 7. 


7 Bistaang (Land Dayak, Rensch et al. 2012) 


PMP Bistaang Change 
*botias > *batis > bates ‘calf of the leg’ *o>a 
*zalaq > jara? ‘tongue’ *o>a 
*losun > rson ‘mortar’ *9>@ 
*taluR > turoh ‘egg’ *9>u 
*bulu > bluh ‘body hair’ *u>@ 
*silu > sroh ‘finger nail’ *u>9 


With regard to PAT *a, because both *a and *u reflexes are found in AN, a pre-PAN stage where *a had 
not yet undergone this split is required. Merger with *u in some words may therefore arise after pre- 
PAN but before PAN, since the irregularities with *o and *u are only visible in AN through comparison 
with KD. That is to say, within AN itself, there is no evidence that some instances of the vowel *u may 
ultimately be from a more ancient central vowel. A schematic is given in example 8 which shows the 
various stages of schwa development. 


8 PAT *oa ([#] or [ur]) — pre-PAN *oa ([# ~ u]) — PAN *a/ *u 


In KD, the same irregular changes happen, but only after PKD began to diversify, since the irregularities 
are apparent in KD-internal comparisons. The fluidity between more schwa-like realizations of *9 and 
more u-like realizations persisted into PKD. 

The PAT central vowels are therefore of two types. The low vowel *a, which underwent 
lengthening in PKD and further developed splits in reflexes conditioned by penultimate vowels in many 
KD daughter languages, and the high-vowel, which is for now written as schwa *a, which stabilized in 
AN but again underwent splits in KD which are explained as arising from the instability of high-central 
vowels and their tendency to undergo unconditioned changes. 


4.3 Diphthongs 

In most KD branches, AN word-final diphthongs *aw and *ay correspond to identical diphthongs in KD 
with additional vowel lengthening, *aw : *a:w and *ay : *a:j, but the presence of a high-vowel in the 
penult had similar effects in diphthongs as elsewhere, resulting in regular high-vowel reflexes in Tai 
and a tendency towards high-vowel reflexes in other KD branches. In example set 9, reflexes of *talaw, 
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*qajaw, and *m-aCay reflect lengthening of *a, whereas reflexes of *Cumay have high-vowel reflexes, 
or have transferred features of the penult directly onto the final syllable vowel. 


9 *talaw ‘afraid’, PT **la:w4, Qabiao laau™* 
*qajaw ‘star’, PT *t.na:w“, PH *ara:u*, Mak ?da-u' 
*m-aCay ‘to die’, PT *p.ta:j*, POB *da;j 
*Cumay ‘bear’, PT **mwwyj“, PH *mui4, S. Kam me®, Lakkja kiti 


4.4 High-vowels 

In final syllables, KD high vowels often reflect a set of conditioned splits which depend on the features 
of the now-lost penultimate syllable, with an additional condition triggered by the presence or absence 
of a final consonant. In many subgroups, high vowels break into diphthongs with a lowered nucleus and 
high front or back off-glides. In Tai, for example, this development is regular. Tai maintains distinctions 
between reflexes of *aw and *u, and between *ay and *i, however, in the length of the syllable nucleus. 
For example, although *u becomes aw, it does not merge with *aw, which becomes a-w. Reflexes of 
final high vowels in Tai are shown in Table 12. 


Table 12: Tai reflexes of high vowels in open final syllables 


‘T ‘louse’ | ‘this’ 
PAN *aku | *kuCu | *ni 
PT *kaw* | *traw“ | *naj© 
Bao Yen | kyw‘! | hyw“! | naj 
Cao Bang | kyw‘! | thyw4! | noj 
Lungchow | kaw 


Shangsi__| kaw‘! | thaw4! | noj©? 
Yay ku“! | raw“! | ni@? 
Saek ku: | raw“! | ni: 


In Hlai and Ong Be, diphthongization is also attested, but in at least PH and PK, reconstructions 
maintain the monophthong, and diphthongization does not occur in all languages. Some examples are 
listed in 10. 


10  *aku ‘I; me’ - PH *aku > Hlai hou', Ong Be hau”, PK *ku > Paha ku*”’, Pubiao kau“!, S. Kam 
212 
jau 
*ni ‘this’ - PH *ni > Hlai nei*, Ong Be nia’, PK *ni > Paha ni>’, Pubiao nai, S. Kam nai*?, 
Lakkja ni?*! 


In closed final syllables, a lowering of high vowels can also be observed in Tai (*i > e, *u > 0). There 
are not as many examples of high vowels in closed final syllables as elsewhere, so these observations 
are tentative, but they do repeat themselves in a number of comparisons. All examples of high-vowel 
lowering in closed final syllables are associated with voiceless stop codas. It is not clear if these 
generalizations can be applied to words with voiced codas. Tai reflexes are organized in Table 13. 
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Table 13: Tai reflexes of high vowels in closed final syllables 
‘bird’ ‘painful’ | ‘hook’ 
PAN *manuk | *sakit (sick) | *kabit 

PT *C.nok? | *ke:t? (hurt) |_*6et? 


Siamese | nok?S? bets! 
Bao Yen | nok?S? - ba 
Cao Bang | nok?S? - bets! 
Lungchow | nuk?%? - bit? 
Shangsi - ket! - 
Yay rok?S? - - 
Saek nok?S? ket}! - 


Outside of Tai lowering in closed final syllables can be observed in some languages, especially in Kra, 
but elsewhere high-vowel lowering is generally not a regular process. Examples are listed in 11. 


11 = *-tug ‘to fall’ (root) - PH *?tuk, PK *tok?, S. Kam tok*> 
*qoatut ‘to fart’ - PH *(?)tu:t, POB *diut, 
*sakit ‘sick, in pain’ - Lakkja tse:t°*, S. Kam ?it°* 


Despite regular lowering in Tai, high vowels remain high in final syllables if conditioned by the 
presence of a high vowel in the reconstructed penult. For example, reflexes of *qudip ‘life; alive; raw’, 
are high in most groups, including Tai, although Ostapirat 2000 reconstructs PK *(k)dep”, again, 
indicating that the affects that high-vowels in penultimate syllables had on final-syllable vowels are not 
uniform across the family. Examples are organized in 12. 


12. *qudip ‘alive; raw’ - PT *C.dip?, PH *uri:p > Hlai rip, POB *zip, PK *(k)dep” 


It is important to make these correspondences between various subgroups and possible AT etyma clear, 
since the major task in AT studies is still determining the number of supported comparisons between 
the two families. A better understanding of regular sound correspondences allows those working in AT 
to rule out comparisons which may be only superficially similar, strengthening the core set of 
comparisons. Also, the more that is understood about correspondences, the better we will be able to 
spot potential comparisons that might have previously gone overlooked. The vowel correspondences 
discussed above are summarized below: 

e PAT *a underwent lengthening in KD, becoming *a:. Further, in most branches the quality of the 
penultimate vowel influences reflexes of *a:. In Tai, high-vowel penults result in a raising of 
reflexes of *a: to *uwi and in other subgroups the features of the penultimate vowel are often 
transferred to the final-syllable vowel. 

e PAT *9 underwent an unconditioned split which manifests itself in a mismatch between AN and 
KD words. Typically, PAN *a corresponds to *a in most KD branches, and *a in PK. Irregular 
correspondences have PAN *a and *u corresponding to KD u, w, a, and a. It is hypothesized that 
PAT *o was a high vowel [#], which then underwent a split. 

e Diphthongs, *aw and *ay, developed as *a. They lengthened but were also affected by the presence 
of a high vowel in the reconstructed penult. 

e = ©The high vowels *i and *u did not change in final syllables between PAT, PAN, and PKD, although 
in several KD branches, word-final high vowels underwent regular diphthongization, *i > aj and 
*u > aw, and lowered in closed final syllables to mid vowels in several branches. 


5 Conclusion 

As counted in this paper, there are just over 70 reasonably well-attested and supported comparisons 
between AN and KD which may descend from a common ancestor PAT. Comparisons meet the 
conditions that (1) they are reconstructable to at least one primary branch in each family, (2) they are 
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regular in regards to sound correspondences, and (3) KD monosyllables correspond to the final syllable 
in AN, assuming that KD monosyllabicity was driven primary by the reduction and eventual deletion 
of an unstressed penultimate syllable. The numbers derive largely from previous works, but with 14 
additional new comparisons. 

With an increase in comparisons between AN and KD comes an increase in our understanding of 
the sound correspondences which exist between the two families. Although irregularity certainly exists, 
there is a high amount of regularity as well. The vowel correspondences, of particular interest in the 
present research, show typical developments. There was likely a PAT low-vowel *a, which remained 
unchanged in PAN but underwent lengthening in PKD, yielding *a:, and was affected by the presence 
of high vowels in PAT penultimate syllables. The central vowel, PAT *a, yields unpredictable reflexes 
both in KD and AN. It is evident that PAT *a was a high-central vowel, which became either *a or *u 
in PAN, and *a in most KD branches but is also reflected by high and mid back-vowels with irregularity. 

Lengthening of *a also occurs in the diphthongs, where *aw and *ay become *a:w and *a:y 
respectively in KD. In many subgroups, this lengthening leads to a merger avoidance, since many 
languages have high-vowel breaking in open final syllables (Tai, for example, in which high-vowel 
breaking is regular). High vowels also undergo lowering to mid vowels where they appear in closed 
syllables in Tai, as well as some other subgroups, like Kra. This lowering, however, is interrupted by 
the presence of a high vowel in the reconstructed penult and is only attested in words that end in a 
voiceless stop. More comparisons are undoubtedly needed to make more concrete descriptions on high- 
vowel development. When compared to AN, at least at the PAN and PMP levels, the vowels change 
much less frequently. Vowel breaking, high vowel lowering, and the effects of high-vowel penultimate 
syllables on reflexes of final-syllable vowels are present in AN, but not at first-order proto-languages. 

The AT Hypothesis remains a tentative hypothesis, although the evidence in its favor continues to 
grow. The evidence for a special relationship between AN and KD is both of a higher quality and 
quantity now than any time in the past, and it is hoped that more research in the area will help us 
understand the precise nature of this relationship. 
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Appendix 1: data sets from Kra-Dai with proto-Austronesian comparisons 
afraid: PAN *talaw, PT *"la:w4, Bao Yen la:w“!, Cao Bang la:w“!, Lungchow la:w“!, Yay la:w“!, Saek 
la:w“!, Qabiao laau™, Buyang i''vau™, S. Kam jau™. 


arrive PAN daton PT *C.tvnA Siamese thumAl Sapa thumA Bao Yen thynAl Cao Bang t'nAl 
Lungchow t'yyA1 Shangsi Yay tanA2 Saek t"anA2 PH Hlai POB dan Al Ong Be Gelao Paha 
khau® Pubiao Qabiao Buyang Lakkja tan231 (come) Mak S. Kam ten55 (come) N. Kam tau25 


91 


Papers from SEALS 30 — Smith 


bear: PAN *Cumay, PT **mwuij*, Siamese mi:4!, Sapa mi‘!, Bao Yen mi:“!, Cao Bang mi*!, 
Lungchow mi:“', Shangsi muy“', Yay muraj“', Saek mutoj*!, PH *mui4, Paha mi32?, Pubiao 
mfje’’, Lakkja kub:i, S. Kam me”. 


bird: PAN manuk-manuk, PT *C.nok?, Siamese nok?’?, Sapa nuk?S?, Bao Yen nok?’’, Cao Bang 
nok?®?, Lungchow nuk?*’, Yay rok?®?, Saek nok?S?, PH *sa°, Hlai tatt’?, POB nuk”? Ong Be nok® 
Gelao ma**no*, Paha nok", Pubiao nokn*”, Qabiao nuk**/niuk*, Buyang ma°nuk!!, Lakkja 
mlok’, Mak nok®, S. Kam mok?!, N. Kam no?!?. 


black: PAN *domdom, PT *C.dam“, Siamese dam”, Sapa dam“’, Bao Yen dam“?, Cao Bang dam’, 
Lungchow dam*’, Shangsi nam“!, Saek ram“!, PH *(?)dam°, Hlai dom’, POB *zam‘“!, Ong Be 
lam', Paha lam3??/dam3!, Pubiao ?dam“!, Qabiao dam*’, Buyang ?dam*!, Lakkja lam>!, Mak nam!, 
S. Kam nem*®, N. Kam nom*. 


blow: PAN *Soyup, Buyang hip™, Lakkja jap, S. Kam sap”!, N. Kam sap". 


boat: PAN *aluja ‘to paddle’, PT *C.rwuro“ ‘boat’, Siamese ruio“”, Sapa hur”, Bao Yen luo“? !, Cao 
Bang luo, Lungchow luw::4”, Shangsi lu“?, Yay ruo*”, Saek ruo“!, PH “ura“ ‘boat’, Paha da’, S. 
Kam lo», N. Kam la®. 


chaff; bran: PAN *qopah, POB va:®°, Gelao pau®', Buyang faa®', Lakkja faa®', N. Kam pa®. 


child: PAN *aNak, PT *luw:k?, Siamese lu:k?'?, Sapa lu?°'?, Bao Yen luk?“*, Cao Bang luk?"’, 
Lungchow luk”, Shangsi lak?"!, Yay lwik?%?, Saek Iuk?"!, PH *ali:k, Hlai twk’law’, POB lo:k?, 
Ong Be lok®, Gelao la*lie>, Pah laak!!, Buyang la:k!!, Mak lak®, S. Kam lak?!?un*”*, N. Kam 
1a??!?un®?. 


cry: PAN *Canis, PT *t.haj©, Siamese ha:j“', Bao Yen haj“', Cao Bang haj©', Lungchow haj‘', Shangsi 
haj°', Yay taj©', Hlai nai*, POB *naj8°'!, Ong Be nai’, Pah nit'', Buyang niet”', Mak ne*, S. Kam 
ne*?, N. Kam ne*’. 


derris root: PAN *tubah, PT *C.buro‘, Siamese bura‘!, Sapa bur“!, Bao Yen bua“! Cao Bang buio*', 
Lungchow bur:“, Saek vio“!. 


die: PAN *m-aCay, PT *p.ta:j4, Siamese ta:j4', Sapa ta:j*', Bao Yen pta:j‘!, Cao Bang ta:j*!, 
Lungchow ha:j“', Shangsi t'a:j“', Yay ta:j*', Saek pra:j*', POB *da:j“', Ong Be dai!, Qabiao tie*’, 
Buyang ma"te™ ‘kill’, Lakkja plei>', S. Kam tei>°, N. Kam toi. 


eat: PAN *kaon, PT *kuin“, Siamese kin‘, Sapa kin“', Bao Yen kin‘!, Cao Bang kin4', Lungchow 
kin“!, Shangsi kyn“!, Yay kuin“!, Saek kin“!, POB kon“!, Ong Be kon!, Gelao ka*!, Paha kaan?2, 
Pubiao kan“!, Qabiao k(upjan**, Buyang ka:n™, Lakkja tsen>', S. Kam tan*°. 


eight: PAN *walu, PH *aRu‘, Paha mu*!, Pubiao rfituw’, Qabiao mo°zuy**, Buyang duu*?. 


excrement: PAN *Caqi, PT *C.quuj°, Siamese k*i:“', Bao Yen k*i:“', Cao Bang k*iS!, Lungchow kti:“', 
Shangsi k*oy“!, Yay haj©’, Saek yaj©’, PH *aka:i©, POB ka:j2©, Paha qe**, S. Kam ?e*', N. Kam 
e7!, 


eye: PAN *maCa, PT *p.ta:“, Siamese ta:4', Sapa ta:“!, Bao Yen p'ja:4!, Cao Bang t*a:“!, Lungchow 
ha:4!, Shangsi tha:“!, Yay ta“!, Saek pra:“!, PH “ata, Hlai tsha!, POB da:“', Ong Be da!, Gelao 
mu*tur!, Paha ma**da?””, Pubiao tee*!, Qabiao te**, Buyang ma°ta™, Lakkja pla>!, Mak da!, S. 
Kam ta>’, N. Kam ta®. 


father: PAN *ama(x)/*aba, PT *bo:®, Siamese p*s:8, Sapa po®, Bao Yen po:®*, Cao Bang bo®?, 
Lungchow po:®”, Shangsi po® *, Yay po®?, Saek p*s:8*, PH *pa©, Hlai pha’, Ong Be be’lau’, 
Gelao a*ba*’, Pah pa33, Pubiao pe*!?, Buyang pai, Lakkja pe®, Mak pout, S. Kam pu*!. 


fire: PAN Sapuy, PT *wxj4, Siamese faj*”, Bao Yen p*yj*”, Cao Bang vxj*”, Lungchow faj*”, Shangsi 
foy*’, Yay fi*?, Saek vi:“?, PH *api‘, Hlai fei', POB *va:j4”, Ong Be vai’, Gelao pia*’, Paha pwi322 
Pubiao pei*!, Qabiao poi’, Buyang pui™, Lakkja pu:i', Mak vai!, S. Kam pui®, N. Kam wi. 
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five: PAN “lima, PH *ama‘, Hlai pa!, Gelao mlur*!, Paha ma**, Pubiao mfiaaa*?, Qabiao ma**, Buyang 
‘e310 


flow; current: PAN *qaluR, PT *"Iwaj*, Siamese laj*', Sapa laj*!, Bao Yen Iwxj‘', Cao Bang Iwxj*', 
Lungchow laj*!, Shangsi laj4!, Yay laj“!, POB lo:j4!, Ong Be loi', Gelao hlei*’, Paha qwi?, 
Pubiao tei“!, Qabiao tai>’, Buyang lui4”, Mak lu:i', S. Kam ui, N. Kam tuil’. 


grandfather: PMP *smpu, PT *puw®, Siamese pu:®', Sapa pu®', Bao Yen pu:®!, Cao Bang pu®!, PH 


*paul, 


hand: PAN *qalima, PT *mwui:“, Siamese mut:*”, Sapa mut“”, Bao Yen mur:4?, Cao Bang mur’?, 
Lungchow mui:*”, Shangsi moy*”, Saek mu:4”, PH *mi‘, Hlai mew', POB ma:*, Ong Be mo’, 
Gelao pa*'mi*!, Paha ma*¥ge*C”), Pubiao hmii®', Qabiao qa°hmi?'?, Lakkja mie”*', S. Kam mja?”, 
N. Kam mja”’. 


head: PAN *qulu, PT kraw‘/truo“, Siamese /huo“!, Sapa /hu!, Bao Yen huo“!, Cao Bang /t®uo*!, 
Lungchow /hu:“!, Shangsi law°!/, Yay caw“!/, Saek ttraw©!/, PH *uRou‘, Hlai gwou*, POB ha:w 
BCl Ong Be hau’, Gelao te*lui*, Buyang qa°du!!, Lakkja kjeu>!, Mak tau*, S. Kam kau*”*, N. 
Kam kau*’. 


hold in fist: PAN *komkom, PT *kam“, Siamese kam‘“', Sapa kam“', Bao Yen kam*', Cao Bang kam*', 
Lungchow kam“', Shangsi kam4', Yay kam“!, Saek kam“!, Lakkja kam* 


hook: PAN *kabit, PT *6et”, Siamese bet?S!, Sapa bit?S', Bao Yen bxt?S!, Cao Bang bet?S!, Lungchow 
bit?. 
I: PAN *aku, PT *ku:4/*kaw4, Siamese ku:“!, Sapa ku“!, Bao Yen/kyw*“!, Cao Bang /kyw“!, Lungchow 


/kaw*', Shangsi /kaw*!, Yay ku“!, Saek ku:4!, PH *aku‘, Hlai hou!, Ong Be hau’, Paha ku, 
Pubiao kau“!, Qabiao kau*’, Buyang ku™, Lakkja tsi>!, S. Kam jau”!”, N. Kam jau”’. 


leech: PAN *moCak, PT *da:k?, Siamese ta:k?!’, Sapa ta?>'?, Bao Yen ta:k?"*, Cao Bang da:k?", 
Lungchow tek?!?-Y, Yay ta:k?“!, Saek tha:kP!?, PH *?ta:k, POB da:k?!, Paha na?!taak”’. 


leg; thigh: PAN *paqa ‘thigh’ PT *p.qa:* ‘leg’, Siamese k*a:“', Sapa xa:4', Bao Yen k'a:“', Cao Bang 
kta:4!, Lungchow kta:4!, Shangsi ha:“!, Yay ka“!, Saek kwa:“!, Hlai ha!, Ong Be va” Paha xga"', 
Buyang ?aa“!, S. Kam pa®, N. Kam pa*lau*'. 


lend/borrow: PAN *Sozam, PT *?ju:m“, Siamese ju:m4', Cao Bang jumm4', Lungchow jim“, Shangsi 
jom“!, Lakkja lam*', S. Kam jam®, N. Kam jam*. 


louse: PAN *kuCu(x), PT *traw4, Siamese haw“!, Sapa haw“!, Bao Yen hyw“!, Cao Bang t*yw4!, 
Lungchow haw“!, Shangsi thaw“!, Yay raw“!, Saek raw“!, PH *utu’, Hlai fou’, Paha 6u3, Qabiao 
qa°tau>*, Buyang qa°tu™, Lakkja ta:u, Mak to’tou', S. Kam tau, N. Kam tau*. 


moon: PAN *bulaN, PT *6luron*, Siamese duron“', Sapa buron“!, Bao Yen buran“!, Cao Bang burton“, 
Lungchow by:n4!, Shangsi bun“!, Yay duron4!, Saek blion“', PH *pa:n4, Hlai nyaan', Paha 
naan*°6”), Pubiao nin“'/taan*', Qabiao taan*’, Buyang lun!!ten!!, Lakkja man!'lie:n?!4, Mak ni:n’, 
S. Kam kwan>>nan°*’, N. Kam mjan*. 

navel: PAN *puja, PT *dwui:4, Siamese du:“', Sapa du“!, Cao Bang duro“!, Shangsi boy©! *, Yay 
duro“!, Saek duo“!, PH *uri4, POB *da:4*, Buyang ?duo“!, S. Kam pjo™ljo®. 


213 


nine PAN *Siwa, Paha dfa**, Pubiao ¢ja®!, Qabiao mo*xia”!*, Buyang vaa®". 


nose: PAN *ujun/*ijun, PT *dan’, Siamese ?dan°!, Sapa dan*!, Bao Yen dan4!, Cao Bang dan“!, 
Lungchow dan‘', Shangsi dana*', Yay dan‘“', Saek dan‘*!, PH *(?)dan*, POB zan*', Ong Be long’, 
Pubiao tan“!, Qabiao qa**tan**, Buyang ga°tin?!?, Lakkja nan*!, S. Kam nen, N. Kam nan*. 


one: PAN “isa, PH *ci°, Hlai tsheur’/tsw’, Ong Be ho’, Gelao tsi>°, Pah ti**, Pubiao teja“!, Qabiao teia®. 


otter: PAN *Sanagq, PT *na:k?, Siamese na:k?“*, Sapa na??'*, Bao Yen na:k?'’, Cao Bang na:k?", 
Shangsi na:k?", Yay na:k?'’, Saek na:k?'’, PH *(9)na:k, Paha naak"". 
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rattan PAN *quay, PT *C.wa:j‘, Siamese wa:j*', Sapa va:j4', Bao Yen wa:j‘!, Cao Bang wa:j*', 
Lungchow va:j*', Shangsi wa:j*!, Yay va:j*!, Saek va:j4!. 


raw; unripe; alive: PAN *qudip ‘alive’, PT *C.dip? ‘raw; uncooked’, Siamese dip?’!, Sapa dip?S', Bao 
Yen dipS', Cao Bang dip?S', Lungchow dip?’!, Shangsi dip?S!, Yay dip?S!, Saek rip?’', PH 
*uri:p, Hlai ri:p/vi:p, POB *zip?', Paha dap®, Pubiao ?dap?', Buyang ?a?dip. 

sell: PAN *baliw/*saliw, PH *aRi:u, PK *s-ywi. 

seven: PAN *pitu, PH *?tu4, Paha 6u%3, Pubiao tuu“!, Qabiao motu’, Buyang tuu*’. 

shadow: PAN *qaNinu, PT *naw“, Siamese naw*”, Bao Yen nxw*”, Cao Bang nyw*?, Lungchow 
naw’”, Shangsi naw’, Yay naw’’, Saek naw’, PH *ana:uS, POB nu:j*'. 


shoulder: PAN *qabaRa, PT *C.ba:®, Siamese ba:®', Sapa ba:®', Bao Yen ba:®', Cao Bang ba:®', 
Lungchow ba:®!, Shangsi ba:8', Yay ba®!, Saek va:®', PH *ava®, Hlai tswi’va’, POB via®“', Ong 
Be bik’via*, Paha ka°yo*5ma**, Pubiao hmaa®', Buyang qa°?ba!!, Mak ha!, N. Kam pja!!. 


shrimp: PAN *qudan, PH *ura:n“, POB *zuan*”, Lakkja tson™*, S. Kam ton. 
sick; pain; to hurt: PMP *sakit, PT *ke:t? ‘hurt’, Shangsi ket?"!, Yay cet?"', Saek ke:t?™!, Paha 6i!!, 
Lakkja tse:t™, S. Kam ?it*?, N. Kam kit®’. 


six: PAN *onom, PH *(9)num*, Hlai tom', Gelao nam*'!, Paha nam3!, Pubiao hnam“!', Qabiao 
mo°hnam*™, Buyang nam”, 


sky/cloud: PAN *Rabun ‘cloud’, PT *6urn“ ‘sky’, Siamese bon“!, Bao Yen bon“!, Cao Bang byn“!, 
Shangsi bon“!, Yay bun“!, Saek buin“!, Buyang ?bun™, Lakkja bon?!, Mak ?bon!, S. Kam mon, 
N. Kam mon*. 


spotted: PMP *bolan, PT *6la:n®, Siamese da:n®', Sapa ba:n®', Bao Yen bja:n®', Cao Bang da:n?'. 


star: PAN *qajaw ‘day’, PT *t.na:w“, Siamese da:w“', Sapa da:w“!, Bao Yen da:w“', Cao Bang da:w“', 
Lungchow da:w“!, Shangsi da:w“!, Yay da:w“!, Saek tra:w“!, PH *ara:u4, Hlai raau', Lakkja 
tau+, Mak ?da:u!?doi°. 


sour: PMP *qosom, PT *sym°, Siamese som“!, Bao Yen t*om“!, Cao Bang tfom!, Lungchow tum“, 
Shangsi tom©!, Yay éam'!, Saek sam, Lakkja khjum™, S. Kam som!, N. Kam som”. 


taro: PAN *biRag, PT *prurok”, Siamese p*urak""!, Sapa p'w??"', Bao Yen pwok?', Cao Bang 
p*urok?"!, Lungchow p'y:k?"!, Shangsi p®yk™"', Yay pwok?"!, Paha pwaak"!, Buyang daak”’, 
Lakkja ja:k™, S. Kam jak*”’. 

this: PAN *-ni, PT *naj°, Siamese ni: ’, Sapa ni©, Bao Yen naj©, Cao Bang nxj©, Lungchow naj’, 
Shangsi noy®, Yay ni’, Saek ni: ’, PH *ni®, Hlai nei, Ong Be nia’, Gelao nyi*’, Paha ni®, 
Pubiao nai©, Qabiao nai**, Buyang ni!!, Lakkja ni7*!, S. Kam nai**, N. Kam nai“. 

three: PAN *tolu, Paha tu>, Pubiao tau“', Qabiao tau’, Buyang tuu“!. 


to fall: PAN *-tug, PH *?tuk, Gelao to*, Paha tok**, Pubiao took?!, Buyang tuk?', S. Kam tok*, N. 
Kam to?». 


to fart: PAN *gotut, PT *k.tyt?, Siamese tot?S!, Sapa tut?S!, Bao Yen tyt?S!, Cao Bang tvt?5!, Shangsi 
thot?S!, Yay rat?S!, Saek ret?S!, PH *(?)tu:t, POB *dut?!, Paha dat®, Pubiao tat?!, Buyang tut”!, 
Lakkja kja:t?, S. Kam tat. 


to plant: PAN *mula, PH *uRa‘/*uga‘, Hlai gwa!, S. Kam mja?!”,N. Kam mja”’. 


to transplant: PAN *tanom, PT *t.nam‘, Siamese dam“!, Bao Yen dam*“', Cao Bang dam“!, Lungchow 
dam‘“', Shangsi dam*', Yay dam*', Saek tram“!, PH *?dap, Paha dam**wa’*2?. 


tongue: PAN *Soma, Gelao dw**maw*!, Paha ma?!, Pubiao mfije4”, Qabiao mie**, Buyang mee*?, 
Lakkja wab, Mak ma’, S. Kam ma”!”, N. Kam ma”. 
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tooth: PAN *[n/n/lJipan, PT *wan4, Siamese fan“”, Bao Yen p*an4’, Cao Bang van*?, Lungchow fan‘”, 
Yay fan*’, PH *ipan4, Hlai fan', Lakkja wan’, S. Kam pjen*, N. Kam pjan*®. 


two: PAN *duSa, Gelao sui*', Pah a3”, Pubiao cee*', Qabiao ze**, Buyang ca™, S. Kam ja’!”, N. Kam 


j qz2 


uncertainty marker: PAN *-nu, Ong Be lou’/no’na*, Paha pa3nau33, Pubiao njau, Qabiao niau*®, 
Buyang noo’, S. Kam neu?!?, N. Kam nou”. 


vomit: PAN *utag, PT *rwuwiok?, Siamese ra:k?!’, Sapa ha??!*, Bao Yen ra:k?'*, Cao Bang ra:k?!’, 
Lungchow ta:k?!?, Shangsi luk?“*, Yay ruok?!, Saek ruok?!*, PH *apa:k, Hlai feek’, POB 
*duak>?, Ong Be duak, Gelao qo**ta**, Paha taak33, Buyang ta:u*!?, Lakkja ta:k, Mak du:k. 


water: PAN *danum, PT *C.nam°, Siamese na:m®, Sapa nam®, Bao Yen nam©’, Cao Bang nam, 
Lungchow nam“, Shangsi nam®, Yay ram, Saek nam®’, PH *nam°, Hlai nom*, POB nam®@, 
Ong Be nam‘, Lakkja num!', S. Kam nem*!, N. Kam nom*!. 


you: PAN *kaSu/Simu, PT *mum“/*mauy4, Siamese mum“, Sapa muin*?, Bao Yen /myuy’’, 
Lungchow /mauy*”, Shangsi man“! *, Yay mum”, Saek mum*”*, PH *mi“, Hlai mew, Ong Be 
mo’, Gelao mur!, Paha mo*!, Pubiao maa*’, Qabiao mi*’, Buyang maa*”, Lakkja ma”*!. 
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Abstract 

This paper presents a description of the two main types of clauses observed in a language 
spoken in Northern Luzon, the Itawit language. It examines the clauses of Itawit using 
gathered spoken data, including utterances in naturally occurring conversations, spoken- 
like data and narrative data. Non-verb constituent usually heads a nonverbal clause. This 
study gives attention to the five types of nonverbal clauses: nominal predicate, adjectival 
predicate, existential predicate, prepositional predicate, and locative predicate. Verbal 
clauses, on the other hand, are usually headed by verbs that occupy the initial position in 
the clause. This analysis gives affirmation to Reid and Liao’s (2004) argument that 
Philippine languages, including Itawit now, typically follow right-branching clause 
structure. That is, heads of constructions usually appear in the initial position in the 
constructions. This paper also distinguishes intransitive constructions from transitive 
constructions. 


Keywords: verbal clause, non-verbal clause, transitive construction, intransitive 
construction 
ISO 639-3 codes: itv 


1 Introduction 

Itawit is one of the many ethnolinguistic groups in Cagayan, together with Agta, Paranan, Kalinga, 
Gaddang, Yogad, Bugkalot, Ilocano, Ifugao, Kalinga, Tinguian, and Ibanag. “Itawit” comes from the 
prefix i, meaning “people of’ and the word tawid, or “across the river,” and thus it means “the people 
from across the river.” Simons and Fennig (2018) maintain that the Itawit language is given alternate 
names Itawes and Itawis. This probably explains the preponderance of the use of the term Itawes, or 
Itawis in existing literature. However, in the latest edition of Peoples of the Philippines, Zafra and 
Lucero (2017) offered an explanation by saying that their language was spelled “Itawes” since the 
Spanish colonial period, but they call themselves Itawit because they pronounce letter [s] as [t] when it 
is the last letter of the word. Aside from that, the Itawit speakers themselves insist that they and their 
language, be called /tawit; hence, this article consistently uses the term Itawit. 

Itawit belongs to the Cordilleran subgroup of Malayo-Polynesian branch of the Austronesian 
language family (Reid (1974, 2006). Cordilleran is comprised of the Central group, the Southern 
Cordilleran subgroup, and the Northern Cordilleran subgroup. Northern Cordilleran consists of at least 
the following languages: Ibanag, Gaddang, Yogad, Isneg, Malaweg, Itawis (also called Itawit), Ilokano, 
and the languages of the various Negrito groups of Cagayan, Isabela, and Aurora provinces, labeled 
variously as Agta, Atta, and Dumagat. 

Itawit is classified as a member of the Cagayan Valley sub-group of the Northern Cordilleran group 
of Northern Luzon, Philippines, as shown in Figure 1. Ibanag, Ga’ dang, Northern Cagayan Agta, Attam, 
Yogad, and Isnag are also members of the Cagayan Valley sub-grouping. 
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Figure 1: The revised sub-grouping of Cordilleran languages (Reid 2006:4) 


Cordilleran 
Meso-Cordilleran | Norhem Cordilleran 
Altan South-Central Cordilieran 
| Southem Cordilieran Central Cordilleran Cagayan Valley North-Eastem Luzon 
A fo A t 
| / West Southern } ~ Pim NN 
| Cordilleran / Norh-Central Cortillern | Thahagic } 
A / ro / i 
| { / Nuclear / a ~ | 
| / / Souther /NuclearCordilleran  Kalings- 
FY Cogditionn f\ Iineg 
‘MN \ } \ | / Ps / ; | / 


ALTS ALTN Itt Pvc Ip. Kat Kak IWR Ist Ipc Bow Bow Knk Kia Ita Ik AgT Be ATTIsG MALAMM AGTCCITW YocGap DerEC DorC Kas DotP PRN 


Gaddan gic 
Fis 


ADA........ Adasen Tineg Casiguran Dumagat ... Tongot KLA.......Kalinga 
AoTOC.... Central Cagayan Agta East Cagayan Dumagat 5G... snag KLN.......Kalanguya 
ALTN......Northem Alta ..... Palanan Duma gat I ......[sinai KNEK ...... Kankanaey 
ALTS ....... Southem Alta Gaddang 0... [ineg MAL....... Malaweg 
ART.........Arla O....0+ Dbanag  ...Ltawis PNO....... Pangasinan 
ATT .......Atta cose Enibaloi K....-wak PRN .......Paranan 


BLW........ Balangao vownen Ffgao CAR ...Karao Yoo ...... Yogad 


Bow......... Bontok K.......... Dokano CAS....Kasiguranin 


In terms of the status of endangerment or development, Ethnologue labels Itawit as a developing 
language, following Lewis and Simons (2010) Expanded Graded Intergenerational Disruption Scale or 
EGIDS, a tool that is used to measure the status of a language in terms of endangerment or development. 
Labeled as such, a developing language is a ‘language that is in vigorous use, with literature in a 
standardized form being used by some though this is not yet widespread or sustainable.’ 

Spoken by nearly 189,000 speakers, as seen in Ethnologue, Itawit as a language, is inadequately 
described as there are very few studies on the description of the grammar of the language. What may 
be the earliest work on Itawit is the publication of Jalojot (1937) entitled Diskripsyon ng Klos na Verbal 
ng Wikang Itawit. Next was probably Natividad and Solomon’s (1970) list of phrases and clauses in 
Itawis then an Itawit wordlist consisting of 372 lexical items as unpublished language data collected by 
SIL International in the Philippines in 1976. This was followed by Dita (2013), who wrote an Itawit- 
Filipino Dictionary. A few authors have delved into comparing some linguistic features of Itawit with 
other languages. One was the work of Bollas and Hernandez (2013), which presented the phonology of 
Tagalog, Cebuano and Itawis by introducing the languages’ phonemic inventories, contrastive pairs and 
their phonotactics. In the same year, after having realized that existence of modest records on 
pluralization especially in Ayta Mag-antsi and Itawit, Bollas and Supnet (2013) delved into a 
comparative analysis of pluralization in four Philippine languages: Ayta Mag-antsi, Itawit, Bikol and 
Tagalog. The results show that Itawit and Ayta Mag-antsi have unique pluralizing morphemes, 
especially in forming proper and common nouns, as compared to Bicol and Tagalog. In 2014, Bollas 
and Hernandez focused on anaphors and anaphoric relations of these anaphors expressed in Tagalog, 
Hiligaynon and Itawit using Principle A of the Government and Binding theory. MacKenzie (n.d.) 
provided an overview of reduplication in Itawit and argued that several reduplicative templates are 
available in Itawit. His work demonstrated that reduplication in Itawit shows a resistance to destroying 
syllable contact. Another linguistic work that focused on morphological processes of Ibanag and Itawit 
was written by Elli and Isidro in 2013. Their study showed that both languages seem to be quite closely 
related because the morphological processes are almost the same. Itawis, as argued, has fewer 
morphological processes in nouns and verbs than Ibanag because it does not have full reduplication in 
nouns, stress placement and added morphemes in verbs. It was in 2016 when the Department of 
Education of Region 02, in close coordination with the Komisyon sa Wikang Filipino, drafted and 
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released an Itawit orthography, which was written by teachers who are speakers of the language and 
has been validated by a group of elders who served as consultants. 

Considering the above-mentioned works, very few linguistic features of Itawit have been studied, 
which leads us to the observation that Itawit is relatively less studied among the Philippine languages. 
Hence, this paper attempts to present a descriptive analysis of the basic clauses in Itawit. The main aim 
of this paper is to present an analytic description of the clauses in Itawit as seen in the gathered spoken 
data and as attested by the chosen language informants. 


2 Methodology 

The data samples on which the descriptions and analyses are based are composed of transcribed 
recordings of naturally occurring conversations involving twelve male speakers and eighteen female 
speakers. All conversations were recorded by the researchers from November 2019 to February 2020. 
These conversations fell in various contexts such as school grounds (teachers with students or other 
teachers), public plazas (friends and neighbors), and home (couples and children). The participants were 
selected on the following basis: (a) They are native speakers of Itawit; (b) They are knowledgeable 
about the Itawit culture; and (c) They can read and write in Itawit. Aside from the naturally occurring 
conversations, the paper also made use of colloquial-written data which came from two active Facebook 
pages that primarily encourage the speakers of Itawit to use the language in the community. Appropriate 
consent was secured from the administrators of the Facebook pages, and utterances taken from the pages 
were posts from 10 August 2019 to 4 September 2020. 

To add to the natural conversations and spoken-like data, the researchers also gathered narrative 
data from Itawit speakers using the Pear Film, a six-minute-long video which is designed to elicit 
reactions and narrations from the participants using their language. The informants for Pear stories are 
Itawit teachers and students from an Itawit-speaking barangay, which is the smallest administrative 
division in the Philippines. They were chosen based on the same criteria set previously mentioned. They 
were requested to watch the six-minute film, and after, they were asked to narrate the story using Itawit. 
They were reminded to just describe what they had viewed and that there were no right or wrong 
answers in the task. All their responses were recorded and eventually transcribed for linguistic analysis. 
A corpus of written data was also gathered, but these were used for cross-referencing purposes only as 
the objective was to reflect the features of present-day Itawit. 

It must be noted though that the clauses are taken from varied data (written, spoken and spoken- 
like data and narrative data); hence, they may be compared for similarities, and in cases where there are 
discrepancies and irregularities, the researchers consulted Itawit native speakers/informants before 
making any final grammatical judgment. Hence, working closely with the language informants, the 
authors also employed some linguistic elicitation techniques. 


3 Itawit as a VSO Language 

It is generally argued that the Philippine languages are basically VSO languages. By VSO, we mean 
that sentences produced in Philippine languages follow the structure which begins with the predicate. 
Thus, such clauses are constructed with verbs occurring in the initial position, while pronominals, 
modifiers, and objects are positioned after the verb. Itawit speakers generally begin their clauses with a 
verb, especially in spoken discourse. However, some utterances may also start with a non-verb item. 
This section discusses the two types of clauses in Itawit: nonverbal clauses and verbal clauses. 


3.1 Non-Verbal Clauses 
Nonverbal clauses in Itawit are headed by a constituent which does not belong to the category of verbs. 
Five types of nonverbal clauses are presented in this subsection: nominal predicates (3.1.1), adjectival 


predicates (3.1.2), existential predicates (3.1.3), prepositional predicates (3.1.4), and locative predicates 
(3.1.5). 
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3.1.1 Nominal predicate clauses 

As the term ‘nominal predicate clause’ implies, the predicate is a nominal and thus takes the initial 
position and is followed by the nominative complement which can be a full NP or a pronominal. Four 
types of nominal predicate clauses are discussed here: (a) classificational, (b) identificational, (c) 
quantificational, and (d) possessive. 


a. Classificational nominal clause 

A classificational nominal clause is defined by Reid and Liao (2004:436) as “those that classify the 
entity expressed in the nominative phrase of the clause”. The authors maintain that the predicate noun 
is typically a bare noun without a specifying determiner, and since it is a predicate, it is interpreted as 
the head of the predication. As observed, the lexical items in bold phase that begin the utterances are 
both nouns that classify the entities: Marge and atawa. 


(1) Empleyadu i Marge kang munsipyo. 
employee PERS Marge OBL municipal office 
‘Marge is an employee at the municipal office.’ 


(2) Seaman ya atawa na. 
seaman DET husband GEN 
‘Her husband is a seaman.’ 


Example (1) begins with the noun empleyadu, which is followed by the nominal marker i, which signals 
the nominal Marge. The nominative complement in the said clause is, therefore, a full NP: i Marge. 
The peripheral argument kang munsipyo is case-marked as OBL. Example (2) also begins with the noun 
seaman, which does not have direct counterpart in Itawit. It must be noted that the noun atawa is 
introduced by a nominal marker ya. Ya atawa na then serves as the nominative complement. Notice 
that the pronominal na is case-marked as genitive because it is used to imply an idea of possession. 

In some cases, the nominal complement may be expressed by a pronoun, either a free pronoun or 
a bound pronoun which may or may not encliticize with the nominal predicate. 


(3) Seaman iggina. 
seaman ABS.3s 
‘He/she is a seaman.’ 


(4) Memestra ira. 
teachers ABS.3p 
‘They are teachers.’ 


(5) Empleyadu nak kang munisipyo. 
employee ABS.1s — OBL municipal.office 
‘Iam an employee at the municipal office.’ 


b. Identificational nominal clause 

Reid and Liao (2004) define identificational clauses as those in which the predicate provides specific 
identification for the entity expressed in the nominative noun phrase. They further mentioned that, while 
classificational predicates are typically bare nouns, identificational predicates are either a definite 
common noun, or a personal noun, or a personal or demonstrative pronoun. 


(6) I Luisa ya agatadag. 


PERS Luisa DET standing 
‘Luisa is the one standing.’ 
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(7) I Dr. Carreon ya wahi=k. 
PERS Dr. Carreon DET sibling=GEN.1s 
‘Dr. Carreon is my sibling.’ 


The examples above have specific names (6) Luisa and (7) Dr. Carreon, which are introduced the 
nominal marker i. The nominal marker primarily is used to identify the entity expressed in the 
nominative noun phrase. In other cases, an independent personal pronoun can also be a predicate in an 
identificational nominal clause as in example (8). 


(8) Yakan ya nangiddan kan bahat. 
ABS.1s | DET PERF-give OBL banana 
‘I was the one who gave banana.’ 


c. Quantificational nominal clause 
A quantificational clause begins with a quantifier, usually a numeral which quantifies the entity 
expressed by the nominative complement, as in examples (9) and (10): 


(9) Lima kilo nga karne yo ginatang na kam=palengke. 
five kilos LIG meat DET bought ERG.Is OBL=market 
‘He/she bought 5 kilos of meat at the market.’ 


(10) Tallu  kansyon kiningwa=k. 
three song PERF-make=ERG. |s 
‘I made/wrote three songs.’ 


Obviously, clause (9) begins with a quantifier specifically a unit of measurement used to refer to the 
noun karne. What seems distinct in the clause (9) is the peripheral argument kampalengke, which seems 
to involve the blending of the locative term kang with the noun palengke that resulted in a one-word 
locative phrase: kampalengke. Clause (10) begins with a quantifier referring to the noun kansyon. The 
clause exhibits encliticisation of the pronominal ku to the verb kiningwa resulting to kiningwak. 


(11) Tanga-supot nga baggat yo iniddan=na. 
one bag/pack LIG rice DET PERF-give=ERG.3s 
‘A bag of rice is what she/he gave.’ 


In example (11), the form of measurement indicated in the utterance is given in terms of supot ‘one 
bag/one pack‘, which is usually distributed as relief item. Among the Itawit speakers, the traditional 
measurement of rice comes in terms of salop and then eventually in terms of kilos. As seen in clause 
(11) Tanga-supot is a measurement expression in Itawit. 

It should also be noted that quantificational words may also include time numerals, as in examples 
(12) and (13): 


(12) = Alas-sais kan gabi messimu yaw nga programa. 
6 o’clock OBL night — will.start DEM LIG program 
‘The program will start is at 6 o’clock in the evening.’ 


(13) Alas-dose nak mallubet. 


12 o’clock ABS.ls | IMP-go.home 
‘I will go home at 12 o’clock.’ 
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d. Possessive nominal clause 

Possessive nominal predicates are a subclass of identificational predicates. They may contain a 
possessive pronoun, a genitive or a locatively marked noun phrase interpreted as a possessor in the 
predicate position. 


(14) Kwak yaw nga_ libro. 
POS. 1s SPA.PROX LIG_ book 
‘This book is mine.’ 


Clause (14) involves an absolute possessive which is marked as POS first person singular in the initial 
sentence position, which makes it a possessive nominal clause. A similar case is presented in the clause 
(15) but is given a twist because the possessive word kwa is followed by the possessor Mandy signaled 
by the determiner i. The sentence also uses a demonstrative locative item marked as LOC.PROX. 


(15) Kwa i Mandy yaw nga asassanat. 
POS GEN Mandy SPA.PROX. LIG doll 
The doll belongs to Mandy.’ 


3.1.2 Adjectival Clauses 

Dita (2007) maintains that Ibanag has a lexical category ‘adjectives’ which are also observed in Itawit. 
Seen as bare and derived adjectives, they head adjectival clauses. Below we take a close look at the two 
kinds of adjectival predicates: qualificational and comparative adjectivals. 


a. Qualificational adjectival clauses 

These items provide a description to the nominal subject. As adjectives, their main role is to qualify the 
NPs. Clauses (16) and (17) are examples of bare or unaffixed adjectival predicates, while clauses (18) 
and (19) are examples of affixed or derived adjectives. 


(16)  Gwapa ne babay kanne _ kanto. 
beautiful DET girl OBL cornerstreet 
‘The girl at the corner of the street is beautiful.’ 


(17) Dakal ya vulan — sangaw. 
big DET moon now 
‘The moon is big now.’ 


(18) Nakasta ya sinnun na. 
nice DET dress GEN 
‘Her dress is nice.’ 


(19) = Sissingngat ya gulay. 
very.delicious DET vegetable. 
‘The vegetable is very delicious.’ 


It must be noted that the described adjectives are introduced by nominal markers as in (16) ne babay, 
(17) ya vulan, (18) ya sinnun, and (19) ya gulay. 


b. Comparative adjectival clauses 


When a comparative adjectival clause is used, it usually describes two or more entities. Examples (20) 
and (21) show the comparative degree and superlative degree of the adjective. 
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(20) Mas narenu— ya bale=m annet — cha bale=k. 
COMP clean DET house=GEN.2s than LOC house=GEN. 1s 
‘My house is cleaner/tidier than your house.’ 


Clause (20) uses the comparative degree of the adjective mas narenu in the sentence-initial position. 
What is distinct in the utterance is the encliticisation of the second-person pronoun mu to the host word 
balay (balay+mo= balem) and the first-person singular pronoun ku to the host word balay 
(balay+ku=balek). Both words seemed to have undergone a phonological change called 
monophtongization of the original dipthong ay; hence, they become balek and balem. Recall that in 
Ibanag, the GEN.1s is converted, which is represented only by a glottal stop, but in Itawit, it would 
seem to appear that the GEN.1s is overt, as seen in clause (20). 


(21) I Saling ya kapianan kan ira ngammin 
PERS Saling DET most.intelligent OBL PLU all 
nga maggurwahi. 

LIG siblings 


‘Saling is the most intelligent among the siblings.’ 


Utilizing the superlative degree of the adjective, clause (21) involves the item kapianan, which makes 
the clause a comparative adjectival clause. 


3.1.3 Existential Predicate Clause 

As the term suggests, existential clauses may express the existence of something or may express 
possession of something. These constructions also have their negative counterparts. Clauses (22), (23) 
and (24) show existence, while clauses (25), (26) and (27) show their negative counterparts. 


(22) Hinian dua saku nga baggat kang kusina. 
EXI two sacks LIG rice OBL kitchen 
‘There are two sacks of rice in the kitchen.’ 


(23)  Hinian furaw nga bahuy — cha lawan. 
EXI white  LIG pig LOC outside 
‘There’s a white pig outside.’ 


(24) Mian relief — goods/rasyon kattu kabi. 
EXI relief goods yesterday 
‘There were relief goods that arrived yesterday.’ 


The first two examples, (22) and (23), use the existential word hinian, while the last example (24) uses 
nian, which seems to be a shortened version of the existential word hinian. Examples (25), (26), and 
(27) express a negative existential meaning. 


(25) Awan tallung nu Lunit. 
NEG.EXI class FUT Monday 
‘There will be no classes on Monday.’ 


(26) Awan kwartu=k. 


NEG.EXI money=GEN. 1s 
“T don’t have money.’ 


102 


Papers from SEALS 30 — Ayunon and Dita 


(27) Awan kami kan unag balay. 
NEG.EXI ABS.1pe LOC inside house 
‘We are not inside the house.’ 


3.1.4 Prepositional Clause 

Reid and Liao (2004) stated that prepositional phrases may be heads of clausal predicates, hence, the 
term prepositional predicate constructions. They maintain that prepositions are found in languages 
throughout the Philippines as prepositional heads of clausal predicates. The preposition para is used to 
signal the benefactive role, as in examples (28) and (29): 


(28) Para cha anak=ku yaw nga sassanat. 
for LOC child=GEN. 1s DEM LIG doll 
‘This doll is for my child.’ 


(29) Para kannikayu yaw nga kansyon. 
for OBL.2pe DEM LIG song 
‘This song is for you.’ 


3.1.5 Locative clauses 

The locative phrase in the following sentences is introduced by the locative particle kang. Specifically, 
locative phrases, as Dita (2007) explains, can be a specific name of a place, or a spatial location as seen 
in examples (30), (31) and (32). 


(30) Kang Disneyland da nga nabbakasyon. 
OBL Disneyland ABS.3p REL went.on.vacation 
‘They went on a vacation to Disneyland.’ 


(31) Kang Linao nak mattrabahu. 
OBL Linao ABS.ls work 
‘I work at Linao.’ 


(32) Kang eskwela na nga natafulan ya sakkalang=na. 
OBL school GEN.3s  LIG found DET ring=GEN.3s 
‘She found her ring at the living room.’ 


These examples indicate the use of the locative phrase at the beginning of the sentences. Specifically, 
the locative phrases express a specific name of a place (Disneyland, Linao) and a spatial location 
(eskwela). 

In some cases, deictic pronouns maybe used as locatives, as in examples (33), (34) and (35). 


(33) Kanyaw ta massimmu. 

here ABS.1pi__will.meet 

“We will meet here.’ 
(Note: Speakers from Amulung use kanyo instead of kanyaw.) 
(34)  Kanyo ta massimmu. 


here ABS.1pi__ will.meet 
‘We will meet here.’ 
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(35) Kannay da naggungut. 
there ABS.3p fought 
‘They fought there.’ 


It should be noted that speakers from Amulung, an Itawit-speaking municipality, may say the same 
statement this way. In clause (36), kanne is probably the shortened version of kannay, while ra is likely 
the shortened version of ira, the third-person plural absolutive. 


(36) Kanne ra naggungut. 
there ABS.3p fought 
‘They fought there.’ 


When locative phrases are introduced by a deictic pronoun, the locative particle may be dropped, as in 
examples (37) and (38). 


(37)  Kanyaw balay nak nahanak. 
here house ABS.1s — gave.birth 
‘I gave birth at home.’ 


(38) Kanyo bale=nak nanak. 
here house=ABS.1s gave.birth 
‘I gave birth at home.’ 


Note that clause (38) is an utterance from an Amulung speaker who used kanyo as a shortened version 
of kanyaw and nanak a shortened version of nahanak. 


(39) Kattuna Manila nak maggatang kang regalu=k. 
there Manila ABS.1s will.buy OBL gift=GEN. Is 
‘I will buy my gift there in Manila.’ 


The above discussion matches Reid and Liao’s (2004) argument that Philippine languages follow right- 
branching clause structure. That is, clausal constructions usually begin with the predicate, while the 
nominal complements, adjuncts and other modifiers follow after the predicate. 


3.2 Verbal Clauses 
Since Philippine languages are generally VSO languages, the normal construction in any Philippine 
language would be to typically start the sentence with a verb functioning as the predicate, followed by 
the nominal and verbal complements. These constructions are called verbal clauses, named as such 
because these clauses are usually headed by verbs that occupy the initial position in clauses. Two types 
of verbal constructions are presented here: intransitive and transitive constructions. 

Transitivity in Philippine languages is determined by the type of complements given to the verbs. 
As stressed by Reid and Liao (2004), it is the type of complements that a verb takes that determines its 
transitivity, not the number. Since this study analyzes a Philippine language spoken in northern Luzon, 
transitivity will be determined by the types of complements given to the verb. On the other hand, 
valency refers to the number of core arguments that a clause has. When a clause has one core argument, 
it is called monadic or monovalent. When it takes two core arguments, it is called dyadic or bivalent; 
when it has three core arguments, it is referred to as triadic or trivalent (Dixon and Aikhenvald 2000). 
For instance, in Itawit, a monovalent construction contains only one core argument, which may be the 
actor or experiencer in the clause. As core arguments, the pronominal or the nominal marker is encoded 
by the absolutive case. 
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3.2.1 Intransitive Construction 

Reid and Liao (2004) define an intransitive construction as having a verb with only a single nominal 
complement. This single complement is referred to as the core argument, an argument that is needed to 
complete the meaning of the sentence. Such a complement in this type of construction may carry either 
the actor or the undergoer. An intransitive construction may have one or two or several complements 
as peripheral arguments, but it only has one core argument. Additionally, there are clauses which are 
semantically intransitive. Dita (2007) termed these ambient clauses in Ibanag. This is also evident in 
Itawit utterances. 


a. Ambient Clauses 

Ambient clauses do not exhibit core arguments. As explained by Dita (2007), these are semantically 
intransitive constructions as they refer to temporal states and hence may not require any accompanying 
nouns. Consider clauses (40) and (41) provided by Dita (2007:49). 


(40) Magguran. 
IMP-rain 
‘It’s raining.’ 


(41) Nabbaddyu. 
PERF-storm 
‘It stormed.’ 


Similar constructions are also observed in Itawit. See examples (41), (42) and (43). 


(42)  Nakkillakit. 
PERF-lightning 
‘There was lightning.’ 


(43) Nallunig. 
PERF-earthquake. 
‘The was an earthquake.’ 


(44)  Mapafuk. 
CONT-drizzle 
‘It’s drizzling.’ 


b. Monovalent (monadic) intransitive 

Itawit utterances also exhibit monovalent construction which contains only one core argument. 
Pronouns, like NPs, can also serve as core arguments. As core arguments, pronominal or the nominal 
markers are encoded by the absolutive case. Clauses (44) and (45) exhibit the use of NPs without 
adjuncts after them. All the above clauses have only one core argument which is marked as ABS. The 
core argument in clause (45) is a genderless pronoun in Itawit, iggina, which comes after the verb in 
the past tense naddangot, while clause (46) has the third-person plural pronoun as the core argument. 
Clauses (47) and (48) have nominal core arguments introduced by the nominal marker yo. 
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(45) | Naddangot iggina. 
PERF-cry ABS.3s 
“S/he cried.’ 

(46)  Nakkarela da. 
PERF-run ABS.3pe 
‘They ran.’ 

(47)  Naggalak yo abbing. 
PERF-laugh ABS child 
‘The child laughed.’ 

(48) | Nanadag yo babay. 
PERF-stand.up ABS girl 
‘The girl stood up.’ 


NPs may also exhibit adjuncts of time, place and manner after them. Examples (49) and (50) have 
pronominal NPs with adjuncts after them. These include the pronominals ka and nak, which are placed 
after the verb mazzihut and mattrabahu. These pronominals serve as the core argument in these clauses, 
so they are in the absolutive case. Notice that the adjuncts of time appear after the pronominal 
complements. 


(49)  Mazzihut ka akkinagalgaw. 
take.a.bath ABS.2s everyday. 
“(You) take a bath everyday.’ 


(50) Mattrabahu nak sonu umma. 
will.work ABS.1s FUT tomorrow. 
‘T will work tomorrow.’ 


Dita (2010) explains that, in Ibanag, adjuncts of time, place and manner may appear after the noun 
complement. In Itawit, as seen in clause (51), the adverb appears before the verb. The word gavvat, 
which is placed at the beginning of the sentence before the verb, functions as an adverb of time. 


(51)  Gavvat naddangot i Maria. 
suddenly cried ABS Maria 
“Maria suddenly cried.’ 


c. Bivalent/Divalent (dyadic) intransitive 

A bivalent intransitive construction has two nominal complements: actor/experiencer and 
theme/patient. The theme refers to an NP that expresses an entity which is a state or a location or which 
is undergoing a motion (Trask 1993). Such an entity is always non-human, either animate or inanimate. 
This is different from the patient, which functions the same way as the theme but is human and/or 
animate. In case-marking the argument, the actor in a bivalent intransitive construction is always 
marked as absolutive (ABS), and the theme is marked as oblique (OBL). The nominal complement in 
bivalent intransitive construction may take the form of a pronominal or a full NP. 


With an absolutive pronoun 

This type of construction consists of only one core argument which is encoded by the absolutive 
pronoun. The core argument in clause (52) is the pronominal in first person singular nak, while clause 
(53) has the third-person genderless pronoun iggina, both are encoded absolutive. The themes in the 
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following clauses, kurtina ‘curtains’ in (52) and baggat ‘rice’ in (53) are encoded by the oblique kang, 
which means they are not core arguments, but rather they are peripheral arguments. 


(52) | Nabbambal nak kang kurtina kattuna_ kayan. 
washed ABS.1s OBL curtain OBL river 
‘I washed curtains in the river.’ 


(53)  Nakkokot iggina kang baggat. 
stole ABS.3s = OBL rice 
‘She/he stole rice.’ 


With an absolutive full NP 

A bivalent intransitive construction may also have a full NP instead of a pronominal as a nominal 
complement. Full NPs are typically actors in this type of clause, and they can be substituted by a 
pronominal. The themes are also encoded by the oblique marker kang. Examples (54) and (55) show a 
bivalent construction with an absolutive full NP i nanang in (54) and i wahik in (55). The absolutive 
full NP is the only core argument in the sentence. The themes, bahuy ‘pig’ and Mondonggo (an authentic 
Itawit dish) and kusina ‘kitchen’ are further encoded by the oblique kan. 


(54)  Naggatang i nanang kang lima nga bahuy. 
bought ABS mother OBL five LIG pig 
‘Mother bought five pigs.’ 

(55)  Nangan i wahi=k kang Mondonggo kang _ kusina. 
Ate ABS sibling=GEN1.s OBL Mondonggo OBL __ kitchen 


‘My sibling ate Mondonggo in the kitchen.’ 


3.2.2 Transitive Constructions 

An Itawit transitive construction, unlike an intransitive one, requires two core arguments: the agent and 
the patient. In this case, the agent is in the ergative case, and the patient, which is always a human 
complement, is labeled in the absolutive case. It may either be bivalent or trivalent. The following 
section gives a discussion of bivalent and trivalent transitive constructions. 


a. Bivalent / divalent (dyadic) transitive 

Bivalent transitives contain two core arguments which could be full noun phrases or plain pronominals. 
As previously mentioned, the agent is case-marked as ergative, and the patient is case-marked as 
absolutive, while the other peripheral arguments present are case-marked as oblique. 


With two full NPs 

In a bivalent transitive construction, the two full NPs are the agent and the patient, which are the core 
arguments; hence, they are marked as ergative and absolutive respectively. If the clause has other NPs, 
they are peripheral arguments which are case-marked as oblique. 


(56) Inuffunan i mestra/tru ya abbing. 
helped ERG teacher ABS child 
“The teacher helped the child.’ 


(57)  Pinakan i tatang ne makilelimut. 
fed ERG father ABS beggar 
‘Father fed the beggar.’ 
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With two pronominals 


(58) Nessussuk da ig gina. 
hid ERG.3p _ABS.3s 
“They hid him/her.’ 


(59)  Kinanna=na yakan. 
hit =ERG.3s ABS.1s 
‘He/she hit me.’ 


With pronominal ergative and full-noun absolutive complements 


In this construction, the core arguments may be a combination of an ergative pronominal and an 


absolutive full NP. 
(60)  Inallangngan=na ya anak=na kattu kabi. 
scolded=ERG.3s ABS child =GEN.3s OBL yesterday 


‘She scolded her child yesterday.’ 


(61) = Netoli da yaw assassanat. 
returned ERG.3pe DEM doll 
‘They returned this doll.’ 

(62) | Padayawan tayu i Afu! 
worship ERG.1p ABS God 


‘Let us worship God.’ 


(63) Hinaradaral=na ya kofunna 
Destroyed=ERG.3s ABS friend=GEN.3s 
kanya _ kiklase=na ira 
OBL classmates=GEN.3s. PLU 


“S/he destroyed his/her friend to his/her classmates.’ 


With full noun ergative and pronominal absolutive complements 
This construction has an agent which is a full NP and a patient which is pronominal. 


(64)  Nassingan ne Pedro _ ira. 
saw ERG Pedro = ABS.3p 
‘Pedro saw them.’ 


(65)  Nessussuk ne bagitolayiggina. 
hid ERG young.man ABS.3s 


‘The young man hid her/him.’ 


b. Trivalent (triadic) transitive 


This kind of construction has three core arguments: the agent and the benefactive which are humans 
and the theme which is non-human. Here, the agent is case-marked as ergative, the theme as absolutive, 


and the benefactive as oblique. Let us consider the following examples. 


With three full NPs 
There are three core arguments in clauses (66) and (67) which take the form of full NPs. 
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(66) Nepangwa i angkel Jimmy kan langkapi 
built ERG uncle Jimmy ABS bed 
ne anak na. 


OBL child GEN.3a 
‘Uncle Jimmy built a bed for his child.’ 


(67)  Nelutu ne anti Suping kan zinagan 
cooked ERG aunt Suping ABS dinuguan 
para kanne — mestru=k. 
for OBL male.teacher=GEN. 1s 


‘Aunt Suping cooked dinuguan for my male teacher.’ 


With two pronominals and a full NP 
In clauses (68) and (69), the agent and the benefactive are pronominals, while the theme is a full NP. 


(68)  Netturat nak kan kasyon para kan ikayu = ngammin. 
wrote ERG.1s ABS song for OBL you all 
‘I wrote a song for all of you.’ 


(69) | Gumatang ka kan espeho para kaniggina. 
buy ERG.2s ABS mirror for OBL.3s 
‘Buy a mirror for her/him. 


6 Conclusion 

This paper has been concerned with describing and analyzing the types of clauses in Itawit, a language 
spoken by nearly 189,000 speakers in Northern Luzon. Nonverbal clauses in Itawit are headed by a 
constituent which does not belong to the category of verbs. Verbal clauses, on the other hand, are usually 
headed by verbs that occupy the initial position in the clause. The analysis of this Itawit data supports 
Reid and Liao’s (2004) argument on the right-branching clause structure of Philippine languages. 
Moreover, the paper affirms the finding that transitivity in Philippine languages is determined by the 
type of complement given to the verb and valency refers to the number of core arguments that a clause 
has. The paper has clearly distinguished intransitive construction from transitive construction. An 
intransitive construction has a verb with only a single nominal complement. It may have several 
complements as peripheral arguments, but it only has one core argument. Intransitive construction then 
may be monovalent with one core argument as the actor, and it may be bivalent with two nominal 
complements, actor/experiencer and theme/patient. An Itawit transitive construction, unlike 
intransitive, requires two core arguments: the agent and the patient. 
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Abstract 

The Idu language is spoken in the extreme northeast of Arunachal Pradesh. Its traditional 
terrain stretched between the Tibetan Plateau and the plains of the Brahmaputra. The Idu 
traditionally lived in highly dispersed settlements. As a consequence, orientation was 
determined more by up versus down as well as by the sides of the many rivers which cross 
their area. The paper describes the lexicon of Idu in describing directions. The 
morphosyntax of directionals is quite mixed; some can be identified as deictics with 
up/down semantics, while others are best described as specialized adverbs. In addition, 
there are verbs of motion which are mark directionality. All these forms incorporate the 
same semantic frame. There is probably no distinct morphosyntactic category of 
directional. The paper illustrates each term with example sentence contexts and represents 
visually directional oppositions as they appear to speakers. Historically, it seems unlikely 
that Idu recognized cardinal directions, North/South etc., but the existing lexicon has been 
interpreted in terms of modern terminology, which sometimes creates confusion for 
speakers. 


Keywords: Idu; directionals 
ISO 639-3 codes: clk, byw, bod, mlv, wno 


1 Introduction: directionals 

The observation that the natural environment in which people live has a strong impact on the 
grammatical and semantic systems they have evolved goes back at least to Sapir (1912). However, the 
literature on directionality or topographical deixis is quite sparse, partly because the languages where 
this has typically been studied are plains or maritime people. Such languages focus on spatial 
terminology, and there are a number of descriptions or orientation in Austronesian for example 
(Ozanne-Rivierre 1997; Bennardo 2002; Alexandre 2003 for Mwotlap [mlv]; Burung 2013 for Wano 
[wno]). Schapper (2014) deals explicitly with elevation in the Alor-Pantar languages. Although Africa 
has plenty of montane areas, few descriptions exist of topographical deixis. However, see Wolff (2006) 
which discusses some of the languages of the Mandara mountains in northeastern Nigeria. In the 
Himalayan region, few studies have described directionals, but see Bickel (1997) for Belhare [byw]; 
Caplow (2007) for Tokpe Gola Tibetan [bod]. Similarly, there have been some general considerations 
of spatial coordinate systems such as Dixon (2003) and Burenhult (2008). 

This paper concerns the directionals and other deixis of the Idu people, who live in the northeast 
of Arunachal Pradesh, itself in the extreme northeast of India. The Idu language resembles Trans- 
Himalayan typologically, although any genetic relationship is yet to be demonstrated. The term 
‘Mishmi’, often found in the literature and still current among the Idu in some contexts, is used in the 
travel literature as far back as the early nineteenth century to refer to three distinct peoples, the Idu, 
Tawra [=Taraon] and the Geman [=Miju]. The common name ‘Digaru’, the name of a major river, is 
also in use. Culturally speaking, these two languages were historically grouped with Kman, as the 
Mishmi. The Idu live principally in Dibang Valley District with some settlements in Lohit and E Siang 
in Arunachal Pradesh. The ‘Upper’ Idu are known as ‘Luoba’ or ‘Khoba’ in China where there are a 
few villages. The 1971 census in India recorded around 7,700 individuals self-identifying as Idu 
Mishmi, although this is no measure of language competence. 
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Ironically, given there is only a small scattering of villages in Tibet, the technical literature is largely in 
Chinese, including at least one full-length grammar. Publications include Sun et al. (1980), Sun (1983a 
and b, 1999), Ouyang (1985) and Jiang (2005)'. An extensive dictionary and draft grammar based on 
the form of the language spoken in India can be found on the author’s website? while Blench (2019) 
summarizes the phonology and orthography used in this paper. 

The Idu live in vertiginous territory, extending from the plains of the Brahmaputra Valley to Tibet. 
However, prior to the earthquake of 1951, there were almost no settlements on the plain, and the general 
pattern was dispersed settlement along six main river valleys (Baruah 1988; Bhattarcharjee 1983). In 
this area of Arunachal Pradesh the inclines are particularly steep, and almost all movement would have 
been up and down the mountainside. Unlike some of the neighbors of the Idu, such as the Tani, who 
spread around the angle of the mountain range, ‘up’ would always have been North for the Idu and 
‘down’ would have been South. The river valleys which cut across Idu territory were the main axes of 
communication, and the rope-bridges which used to span the major rivers were vital to trade and social 
intercourse. As a consequence, the Idu language evolved to reflect this pattern of orientation, which is 
now somewhat discordant with their post-earthquake geography. Following the massive destruction, 
many villages and individuals moved down to the plain to live among the Assamese and Nepalis 
resident there. 


Map 1: The territory of the Idu 


Key: INDIA _ Nation state 
Adi Ethnic group 


m@ Major Idu settlement 
ween ee eeee International boundary 


Me —SsdIdu territory 


1 These publications are accessible and indeed available for sale on the internet. I cannot read Chinese, but a 
combination of English glosses and example sentences means that is possible to establish general 
correspondences with data from the Indian side. The Chinese notation of tones seems to correspond broadly 
with those recorded for the present study 

Idu resources (rogerblench.info) 
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Map | shows the approximate extent of Idu today. It has recently been reported that the few Idu villages 
in China (where they are known as Lhoba) have been forcibly removed further inland, and that attempts 
are being made to Sinicize this small outlying population. 

As a consequence of their geographical situation, Idu has developed an extensive set of 
directionals, adverb-like forms which incorporate deixis. These ‘topographical deictics’ are reported for 
Tani languages (Post 2011, unpublished) and are likely to be prominent in cultures living in steep 
environments, where ‘up’ and ‘down’, ‘north’ and ‘south’ are more relevant than conventional cardinal 
directions. 

This paper? describes the movement verbs marked for direction, the basic deictics of Idu and then 
goes on to exemplify the different types of directional, based on orientation towards the mountains and 
plains, the rivers, and the village. It concludes there is a strong relationship between this lexical richness 
and the physical environment of the Idu. 


2 Movement verbs marked for direction 

The natural environment the Idu people inhabit ensures that there is a semantic merger between climb 
and other more usual forms of ascension. So ‘climb’ and ‘go up’ use the same term marking elevation, 
but do not incorporate directionality, as shown in Sentence 1. 


1. shi ‘to climb, ascend, go up’ 
aya imu éya shi.ga 
that man mountain  climb.ing 
‘That man is climbing the mountain.’ 


Similarly, and more alarmingly, ‘fall, go down, descend’ are also merged into one verb. This has 
particular resonance for Idu speakers, as falling off a mountain is a particular type of death for which a 
specific level of the underworld is reserved. As sentence 2 indicates, the verb is associated with the soul 
leaving the body. 


2; cépd(t6) to fall, go down, descend 
ngd ayd né copoto aba né ilingaayanga ba 
I hill from fall PERF and _= soul go 


“T fell from the hill and my soul left my body.” 


3. Deictics 

Idu conventional deictics do not form part of the directional system and correspond quite well to those 
in English, as they are not marked for direction or orientation. The following examples of spatial deictics 
and demonstratives provide a brief summary of their use. 


3. ala here 
naba ala jt.gayi 
Father here sit.PRES 
“Father is sitting here” 


3 Data for the paper was gathered in a series of field trips to Idu territory, 2015-2020. I would like to thank Mite 


Lingi and Hindu Meme both for help with working on understanding directionals and developing the 
transcription of Idu. 
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4. ahi there 
ahi ma iséya mé- deé.ga? 
there LOC who AFF sstand.ing 
“Who is standing there?” 


3: &ca this 
écad yu aka taci puma 
this beer bestrong very really 
“This rice-beer is very strong” 


6. aya that 
aya imu habrii mbra 
that man eattoomuch very 
“That man is a glutton” 


4 Directionals 
It is in the field of directionals that Idu is particularly rich. Directionals can be subdivided into five 
major semantic categories, summarized in Table 1. 


Table 1: Categories of Idu directionals 


Category Comment 

Verticality up/down in relation to the position of the speaker 

Cardinal directions North/South etc. (though there is evidence these are modernised 
interpretations of more traditional terms) 


Rivers upstream/downstream 
Villages towards the upper/lower part of the village 
Handedness right/left 


Table 2 lists the lexemes associated with Idu directional categories and these are systematically 
exemplified in the following sections. 


Table 2: Lexemes used in Idu directionals 


Category  Idu Gloss 

Verticality adri straight up (speaker is on the ground) 
ama straight down (e.g., speaker is in a tree) 
ayuma downwards 
ayumanytl towards downwards 
told upwards 
éto(1o)nytt towards upwards 

Cardinals ald, yalé upwards, North 
(y)al6nyt northern side 
atu up there North 
atudri up there on top, high up 
atuya there upwards there North (close) 
ama down South 
amaya there South, downwards (close) 
amanyu southwards 
api on the south side, down there South (remote) 
ahi over there East or West 
ahiya here East or West (close to speaker) 
ahila there East or West (close to speaker) 
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Category  Idu Gloss 
Rivers ano downstream 
arhé upstream 
ahinyi on the other side 
(maci) hrégényti_ on the other side esp. rivers 
(maci) ékényi __ on this side esp. rivers 
élanu on this side 
éwanyu on this side Hill dialect 
Tlin(y)a on my side 
Villages anggodca towards the upper part of the village 
anggdpo towards the low-lying part of the village 
Hand écanyui right side 
lakényti left side 


Two aspects of Idu syntax relevant to the directionals can be noted here. Word order in Idu is fairly 
free, with prominence or focus denoted by fronting locatives and qualifiers. Idu frequently omits subject 
pronouns where these can be inferred from context, and thus directionals are often in initial position in 
a clause or sentence. Idu has a large array of locative markers, both bound and unbound, which often 
have the effect of multiplying the marking of position or orientation of subjects and speakers. 


Verticality 

The Idu environment is montane, so it is frequently necessary for a speaker to point up and down to a 
hearer to indicate the position of something. However, trees are also the dominant vegetation type, and 
must often be climbed for their fruit and other products, hence the vertical directionals exemplified 
below. Sentences 7-10 provide examples of these terms in use. 


rp adri straight up, up there (e.g., if you are on the ground) 
aya adri acdpu akha aba 4a 
that up shelf keep IMP AFF 
“Keep that on the shelf there” 


8. ama straight down, down from (e.g., if you are in a tree) 
dsimbo.aneé amd andonggd do.dja cho! 
tree.LOC jump.IMP HORT 
“ [You], jump down from the tree! “ 


there down 


9. étonyi upwards (up from the ground) 
éetonyu — shit. hi.mi.yi 
upwards climb.can.NEG.PRS 


“Tt is hard to climb upwards“ 


10. ayumanyti downwards, down from there 
ay4  mané ayumanyu  éb0d.aja go chi.pra.pra.yi 


there LOC downwards fall.IMP from walk.good.is.PRS 
“Tt is pleasant to walk down from [the mountain]” 
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The mountain is not explicitly mentioned, but an Idu speaker would infer this from the context. 


Figure 1: presents the vertical directionals as a graphic 


ama 


Cardinal directions 

The European system of cardinal points is quite a recent adoption in Idu and has been superimposed on 
a previous system which essentially marked upwards (i.e., towards Tibet) and downwards (towards the 
Assam plains). Although speakers translate the directionals using ‘North’ and ‘South’ today, this is a 
modern gloss on an initially quite distinct system of orientation. Sentences 11-17 provide examples of 
these terms in use. 


lla. al6 North upwards [yal6 in Upper dialects] 
ngd alo Anini né~ ja 
I north anini from come down 
“T came down from Anini“ 


11b. al6 North upwards [omitting subject pronoun] 
alo mraa né ja? 
north hill from come? 
“Have you come down from the hill?” 


llc.  yal6 [Upper dialect] 
Mili ——yalo kha. gayi 
Hunli upthere lie.PRS 
“Hunli is up there“ 


12. atti up there North 
nga aliya atu anggoca ji. gayi 
my brother upthere northside live.PRS 
“My brother is living up there in the upper part“ 
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13. atudri up there, on top 
atudrt ~=imtiduma pra Ii.gayi 
up there sky.LOC bird fly.PRS 
“Up in the sky, a bird is flying” 


14. atttya there upwards North 
atuya mocd  pra.m.né hé! 
there near COP.AFF.LOC EXCL 
“Tt is very near up there!” 


15. amaya there downwards, South 
amaya ga d 
there goandcome LOC 
“Go there and come back” 


16. api from the south side, down there South 
api nyu.ne itt a yi? 


south you.LLOC come QM _ AFF 
“Have you come from the South?” 


17. yalonyt north (downwards) 
yalonyu né li.gd.a jayi 
northern side from fly.PL.FOC come.PF 
“They flew down from the northern side” 


Idu did not originally distinguish East and West, as there was only a generic term which meant 
‘sideways’. If modern cardinal directions are required, they are expressed with borrowings from 
English. Note that degree of remoteness from the speaker is lexically specified, in contrast to the 
North/South distinctions. Sentences 18-21 provide examples of these terms in use. 


18a. ahi sideways, over there East or West, close to speaker 
ahi imu khaga_ dé.gayi 
there person one stand.PRS 
“One person is standing there ” 


18b. 
ahi ma _— istyame deé.ga? 
there LOC who standing 
“Who is standing there?” 
19. ahinyd that side, the direction you are facing (East or West only) 


adhinyi ba.ba — himi.ya 
there go.IMP be able.AFF 
“Go over to that side” 
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The following two terms in 20 to 21 appear to be interchangeable. 


20. ahila sideways, there East or West, remote from speaker 
ahila iki khaga si  téne kha.ga = ma 
there dog one die CONN lie.PRT AFF 
“There’s a dead dog over there” 


21a. ahiya sideways, there East or West, remote from speaker 
ahiya dsimbo.d pra.a khaga_ ndo.gayi 
there tree.LLOC bird.SG one perch.PRS 
“A bird is perching on that tree over there” 


21b. 
ahiya  isiva.ga 0.0 a? 
there whose.LOC house.LOC QM 
“Whose house is that over there?” 


Figure 2 shows the directionals exemplified above in relation to the mountains and the plains. 


Figure 2: Directionals in relation to mountains/ plains and cardinal points 


North 
alo atu 
ahi EGO ;eOQ || ahi 
West | | East 


amaya || api 


South 


Rivers 

Idu terrain is heavily dissected by rivers, which are usually dry in the later part of the year, but flooded 
suddenly following the snowmelt on the Tibetan Plateau. Although some of these have been bridged 
recently, and risky crossings using heavy iron cables have been possible for a century or more, being 
on the right side of a river at a given time takes on great importance in Idu life. Hence, the terminology 
marked both the direction of flow from the source to the mouth, as well as the side of the where the 
speaker is. Sentences 22-29 provide examples of these terms in use. 
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22. 


23. 


24. 


25. 


20. 


ano downstream (towards the mouth) 
maci and dunyii ba chd 


water downstream side go HORT 


“Let’s go downstream” 


arh6 upstream (towards the source) 


maci arho dunyi imu agi.gd.ga 
water upstream side people walk.ing.PRS 


“Do you see people walking upstream?” 
éwanyti on the other side 

maci éwanyil dé 

river other side stand 


gane 
then 
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athu. jiya? 
see.QM 


grd.gayi 
shout.PRS 


“He is standing on the other side of the river and shouting” 


flin(y)ti on my side (originally of a river) 
nyu ilin(y)i nga mbromro ji 
you sg. this me with sit 
“You sit this side with me” 


élanti on this side (originally of a river) 
nytt elanu ibi.l6 
you sg. this side come from.LOC 
“You come this side please” 


These words have been extended to more general contexts in recent times. 


Figure 3: Directionals from a riverbank 


Source 
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On both sides 
The following locative expressions have been extended from terms which applied orignally only to 
rivers. 
27. éndnyfi hoya both sides 
énonyit hoya né agu.pra.gayi ma 
both sides LOC walk.possible.PST AFF 
It is possible to walk from both sides 
28. énonyt...déga describes something which is positioned both sides of the subject 
nyu énonyi imu dé.ga 
you side people stand.PRS 
“There are people standing on both sides of you” 
29. éndnyfi dunt at both ends 
ngd.ci 0 énonyu duni maci — khda.ga 
my house both sides water lie. PRS 
“Water is lying on both sides of my house” 
Villages 


It is likely that prior to the 1951 earthquake, most Idu villages were very small, consisting of only a few 
households, scattered up and down the hillside and sometimes not even in sight of one another. Hence, 
it was useful to specify the direction of the ‘upper’ and ‘lower’ village. Sentences 30 and 31 provide 
examples of these terms in use. 


30. 


31. 


anggdéca towards the upper part of the village 
nga oO hé Ejénggo atiko  anggocd  dinyii__kha.gayi 
I house LOC Ejengo village upper part side lie. PRS 
“My house is in the northern side of Ejengo village” 


anggd6po towards the low-lying part of the village 
€cd anggdpo dunyit.né ama ya.ga mpi 
here downside side.LLOC wind blow.PRS AFF 
“The wind blows here from the downside” 
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Figure 4: Directionals within the village 


anggoca 


anggopo 


Handedness 

Idu also recognises left and right in respect of an individual. Unlike the other directionals, these are 
could either be treated as nominals or advedrbials. Sentences 32 and 33 provide examples of these terms 
in use. 


32. écanyd right side 
Iméhi.ci nyuko écanyi kha. gayi 
Imehi.POSS room _ rightside lie.PRS 
“Tmehi’s room is on the right side” 


33. lakénydi left side 
éca ngd lakényii deé.ga 
this my leftside stand.AFF 
“This is on my left side” 


5 Conclusions 

Conventional deictics in Idu are quite sparse and closely resemble those in English in terms of semantic 
structure. However, Idu has an extremely rich system of directionals, in relation to mountains and plains, 
rivers and villages, which is a reflection of the steep environments the Idu traditionally inhabited. The 
neighbouring Tani people have a system which is evidently related conceptually although there are 
apparently no common lexical items. It is likely these directional systems are more widespread in the 
region than is apparent in the descriptive literature. 
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Abstract 

This paper aims to discuss the nominal marking system of the Kinaray-a language. 
Kinaray-a is the language of the Karay-a people, who live in the provinces of Antique, 
Capiz, Guimaras, Iloilo, Palawan (KWF 2016) and some mountainous areas bordering 
Aklan and Antique in the Philippines. However, the Kinaray-a variety discussed in the 
present study is the one spoken in the southern part of Antique Province. The analysis is 
based on written and spoken corpora which include Kinaray-a online news reports, literary 
texts, utterances in naturally occurring conversations, and spoken-like narratives. The 
paper discusses a prototypical noun phrase of Kinaray-a among other Philippine languages. 
A Kinaray-a noun phrase may contain two categories of nominal markers: determiners and 
demonstratives. Furthermore, the paper supports Tanangkingsing’s (2009) description that 
noun phrases in Philippine languages may contain case markers, plural markers, 
determiners, possessive or genitive pronouns, numerals, modifiers, or ligatures, as they are 
also evident in Kinaray-a. Notes on types of Kinaray-a nouns, their morphological 
formation and linguistic elements, such as bare nouns, borrowed nouns, and affixed nouns 
are also presented in the appendix part of paper. 


Keywords: nominals, nominal marking system, Kinaray-a, Philippine language, language 
documentation 
ISO-39: krj 


1 Introduction 

The Ethnologue describes Kinaray-a as an Austronesian language in the West Bisayan subgroup of the 
Malayo-Polynesian branch. This is the language of the people in the Province of Antique of the 
Philippines. It is also spoken by Karay-a people who live in mountainous areas bordering Aklan and 
Antique (Abadiano 1980) and in the different provinces of Capiz, Guimaras, Iloilo, and Palawan (KWF, 
2016). 

According to Tiongson (1994), the term Kinaray-a is derived from iraya meaning ‘upstream,’ 
prefix ka meaning ‘companion’ and an in meaning ‘to have undergone something’. Alternate names are 
Hiniray-a, Karay-a, and sometimes Hamtikanon (Delos Santos, 2010) and Antiqueno (KWF, 2016). 
Simons and Fennig (2018) report that there are approximately 380,000 speakers of the language and 
that it has a language status of 4 (educational). Currently, the language is recognized by Department of 
Education (DepEd) - Philippines as one of the major languages under the Mother Tongue-Based 
Multilingual Education (MTBMLE curriculum). It is also commonly used in tourism signage in the 
Province of Antique, as well as in internet news reports and Kinaray-a writers’ published literary works. 
Delos Santos (2003) claimed that the actual number of speakers is undetermined as the language has 
long been incorrectly classified as [longo (Hiligaynon). According to him, Hiligaynon was the only 
primary language considered in Region VI before 1980s as it was the language spoken by those who 
were dominant in the local government, religion, education, and culture. Kinaray-a was only recognized 
as a language later because of the development of Kinaray-a literature in the 1980s. 
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Figure 1: Distribution of Kinaray-a Language in the Philippines 
(Atlas ng mga Wika ng Filipinas, 2016:100) 


In this paper, the Kinaray-a nominals and the nominal marking system are described. It will emphasize 
the constituent order of a typical Kinaray-a noun phrase (hereafter, NP), the types of nominal markers 
used in the language, and the functions of these nominal markers. The morphological formation of 
Kinaray-a nouns will also be presented. 

The Kinaray-a variety in the southern part of Antique Province is considered in this study. Ergative 
case-marking is used in the analysis of the utterances as the language exhibits ergative constructions. 
The data are drawn from both spoken and written corpora. Spoken data include recorded conversations, 
homilies, pear stories, and narratives in cooking dishes, while online news reports and literary works 
are utilized as the written data. 


2 Nominal Markers 

A Kinaray-a NP may contain a determiner or demonstrative and a head noun or pronoun. The nominal 
markers take the initial position of NPs. The phrases below are the examples of how Kinaray-a NPs are 
constructed: 


Determiner 
1. ang bata 
det child 
‘the child’ 
Demonstrative 
2. dya nga bata 
dem lig child 
‘this child’ 
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The utterances 3 and 4 are examples of nominalized clauses introduced by the determiner ang. 


3. Ang ginhambal nanda. 
det say-pfv erg.3p 
“What they said.’ 

4. Ang ginahindian na. 
det like-neg erg.3s 


“What he/she doesn’t like.’ 


Also, Kinaray-a NPs require a ligature to connect the nominal marker and a modifier to the head noun, 
as in example 5. 


5. Ang manani nga panahon. 
det nice lig weather 
“A nice weather.’ 


Although it is common for Kinaray-a NP to have determiner as nominal marker, there are some cases 
in which a marker is not required. These instances are in vocatives and answers to question 
constructions, as in examples 6 and 7 respectively. 


Vocative 

6. Ginoo, kaloy-i kami 
Lord mercy  abs.1p 
‘Lord, have mercy on us’ 


Answer to a question 

Question: Ano ang ngaran mo Ma’am? 
q-what_ det name  abs.2p hon 
“What is your name, Ma’am?’ 


7. Answer: Elvie 
Elvie 
‘Elvie’ 


2.1. Determiners 

Like other Philippine languages, determiners in Kinaray-a encode number (singular and plural), case 
(absolutive, ergative/genitive, and oblique/locative), and they distinguish between common and 
personal nouns. Table 1 presents Kinaray-a determiners. 


Table 1. The Kinaray-a Determiners 


Case Common Personal 
Singular Plural Singular Plural 
Absolutive ang ang mga Si sanday 
Ergative/Genitive ka(ng) ka(ng) mga ni nanday 
Obligque/Locative sa sa mga kay kanday/kananday 


For the determiner ang, it may precede a single word noun (8); an abstract noun (9); a relativized 
clause (10); a verb-like item (11); a preposition (12); an honorific (13); and before an NP with 
modifiers (14): 
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8. ang balay 
det house 
‘the house’ 
9. ang paglaum 
det hope 
“the hope’ 
10. ang tawo nga tandus 
det person lig industrious 


‘the industrious person’ 


11. ang gingamit nga pangrara 
det use-pfv lig to.weave 
‘the (material) used for weaving’ 


12. ang para kana 
det prep abs.3p 
‘for her/him’ 


13: ang Mayor 
det hon 
“the Mayor’ 


14. ang mga nami kag tibay nga hilo 
det plu nice conj strong lig thread 
‘the nice and strong thread’ 


The markers that introduce a core argument are called core nominal markers. If the core argument is a 
full noun phrase, it is usually introduced by a determiner, as in 15. Otherwise, it may utilize an 


absolutive pronominal with the absence of a marker, as in 16. 


15. Nagpanaw ang bata. 
walk-pfv det child 
“The child walked away.’ 

16. Nagpanaw tana. 
walk-pfv abs.3s 
“He/She walked away.’ 


2.1.1. Nominal marker ‘sa’ 
For the oblique marker sa, it usually precedes locative nouns, as in sentences 17 and 18. 


17. Duro gemstone ang makita Sa suba. 
more gemstone abs find-ipfy —_ obl river 
“(You can) find more gemstones in the river.’ 


18. Lain gid ang pangabuhi Sa uma. 
different par abs lifestyle obl farm 
‘The lifestyle is really different living in the farm.’ 
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The marker sa may also signify temporal marking such as specific times or parts of the day in an oblique 
phrase, as in sentence 19. 


19. Sanag ang mga bituon sa kagab-ihun. 
Sanag ang mga bituon sa kagab?ihun 
bright abs plu star obl night 
‘The stars (shine) brightly at night.’ 


The nominal marker sa can also mark the oblique case in a benefactive construction. 


20. Nagtugro tana ka bugas sa mga katawhan. 
give-pfv abs.3s det rice obl plu people 
“He/She gave rice to the people.’ 


2.1.2. Nominal marker ‘si/sanday’ 

The marker si is the counterpart of ang for personal nouns. It marks a particular absolutive personal 
argument. Sanday is the plural counterpart of si. An example of the nominal singular marker si is 
presented in sentence 21, and the plural counterpart sanday is in sentence 22. 


21. Nahulog Si Mark — sa kahoy. 
fall-pfv abs Mark loc tree 
“Mark fell (from the) tree.’ 


22. Sanday nanay kag tatay lang ang tawo Sa balay. 
abs mother conj father par det people loc house 
“Only mother and father are at home.’ 


2.1.3. Nominal marker ‘ni’. 
The nominal marker ni is the counterpart of ka(ng) for personal nouns and serves as the agent of a 
transitive verb or as the possessor in a possessive construction. 

In transitive constructions, the common determiner ka and personal determiner ni introduce the 
agent, and the determiner ang introduces the patient. Interestingly, the determiners ka and kang (23a 
and 23b) may be used interchangeably without any semantico-syntactic impact 


23a.  Ginkaun ka bata ang peras. 
eat-pfv erg child abs pear 
“The child ate the pear.’ 
23b.  Ginkaun kang bata ang peras. 
eat-pfv erg child abs pear 
“The child ate the pear.’ 
24. Ginkaun ni Ruzel ang pagkaun ni Allysa. 
eat-pfv erg Ruzel abs food gen Allysa 


“Ruzel ate the food of Allysa.’ 


When the pronominal counterpart is used, the agent is marked by an ergative determiner and the patient 
is marked by an absolutive determiner, as in sentence 25. 
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25. Ginturuk na ang kanding nga 
watch-pfv erg.3s abs goat lig 
“He/She watched the goat walking.’ 


nagapanaw. 
walk-cont 


Similar to Cebuano (Tanangkingsing 2009) and Hiligaynon (Santos 2012), Kinaray-a NPs may exhibit 
any of these patterns. It is interesting to note, however, that the ligature ka is only used in numerals, as 
in sentence 29, while the sentences 27, 28, and 30 co-occur with nga. 


1. Determiner + mga + Noun 


26. ang mga tawo 
det plu people 
‘the people’ 


2. Determiner + Possessor Pronoun + nga + Noun 


27. ang anda tanan = nga gamit 
abs gen.3p whole lig thing 
‘(all) of their things’ 
3. Determiner + Modifier + nga + Noun 
28. ang pwerte kalapad nga parayan 
abs par wide lig rice.field 
‘the wide rice field’ 
4. Determiner + Numeral + ka + Noun 
29. sa sara ka basket ka peras 
obl one lig basket gen pear 


‘in one basket of pear’ 


5. Modifierc]ausal + nga + Noun 

30. Bahol nga bata 
big lig kid 
“(a) big kid’ 


2.2. Demonstratives 

Demonstratives or deictic pronouns in Philippine languages are generally divided into three sets 
expressing nearness to the speaker, nearness to the addressee, and remoteness from either of the two 
(McFarland 2008). The same feature is also evident in Kinaray-a. Table 2 enumerates the 
demonstratives in Kinaray-a. 


Table 2. The Kinaray-a Demonstratives 


Proximal Medial Distal 
Case (near the speaker) (near the listener) (far from both) 
Kinaray-a Gloss Kinaray-a Gloss Kinaray-a Gloss 
Absolutive dya(ay) this ra(ay) that to(ay) that 
ria(n) 
Ergative dya(ay) this ra(ay) that to(ay) that 
ria(n) 
Oblique (ri/ru)dya here (ri/ru)dyan there (rig/rug)to there 
(ri/ru)gya (ri/ru)gyan 
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It is interesting to note that Kinararay-a demonstratives have two or more variants for all three spatial 
orientations. Each deictic pronoun has several forms to express different cases (McFarland 2008). In 
Kinaray-a, nouns in the absolutive case may be preceded by dya(ay) ‘this’, ra(ay) ‘that’, or to(ay) ‘that’ 
(yonder) 

In Kinaray-a, the demonstrative dya (or the variant dyaay) ‘this’ is used as a proximal absolutive 
deictic pronoun. But dya may be used with an additional emphatic form amo, which seems to be an all- 
purpose deictic that can stand alone and expresses emphasis without spatial reference. The examples of 
how these deictic pronouns are used are presented in sentences 31a and 31b. 


3la. Dya ang akun bata. 
prox abs gen.ls child 
“This is my child.’ 


31b. Amo dya ang akun bata. 
emp prox abs gen.ls child 


“This is my child.’ 
32a. Kanakun ang balay nga dya 
gen.ls abs house lig prox 


‘This house is mine.’ 


Based on the sentences 31a to 32a, it can be claimed that Kinaray-a demonstratives may either be 
pre-nominal or post-nominal. Two features should be noted here. First is the use of the ligature nga. If 
a noun phrase precedes the demonstrative dya, it may be preceded with a ligature nga, as shown in 32a. 
Consequently, if the noun phrase appears after dya, nga is placed after dya as illustrated in 32b. 


32b. Kanakun ang dya nga balay. 
gen.ls abs prox lig house 
“This house is mine.’ 


A construction where a genitive appears in between the demonstrative and the nominal is also 
grammatical. 


32c. Dya ang akun nga balay. 
prox abs gen.ls_ lig house 
“This is my house.’ 


However, constructions without nga are also considered grammatical, but the use of nga makes the 
sentence more emphatic. 


32d. Dya ang akun balay. 
prox abs gen.ls house 
“This is my house.’ 


For ra(ay) ‘that’ in 33 to mark medial demonstratives, and to(ay) ‘that’ in 34 to mark distal 
demonstrative, the same grammatical features are applied with dya(ay) ‘this’ in the absolutive case as 
discussed above. 


33. Amo ra ang niyog Siguro. 


emp med abs coconut maybe 
“Maybe, that’s the coconut.’ 
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34. Naghapus run to ang langka. 
soft-pfv par dis abs jackfruit 
“The jackfruit has softened already.’ 


It is noted that word classification in Philippine languages is based on the interrelationship among 
various affixes and their use in discourse (Santos 2012), For instance, there are noun roots or nouns 
derived from verbs, adjectives, or any grammatical category (cf. Dita 2011). There is also no clear 
lexical distinction between nouns and verbs in Philippine-type languages, which is a unique feature 
(Amerila 2018) of Philippine languages. Nolasco (2011) also asserts that grammatical categorization of 
Philippine-type languages, particularly root forms, has proven difficult, especially at their boundaries. 

Kinaray-a nominals share these morphological properties with Philippine languages. Notes on the 
different types of Kinaray-a nouns especially on the affixation process are presented in Appendix A of 
the paper. 


3 Conclusion 

This paper describes the constituent order of a typical Kinaray-a NP. Determiners and demonstratives 
are the two types of Kinaray-a nominal markers. These markers usually introduce the head nouns in 
NPs. The Kinaray-a determiners are ang, ka(ng) and sa for common nouns, which are basically singular, 
but they can be made plural by adding the marker mga after the determiner, including ang mga, ka(ng) 
mga, and sa mga, while the personal determiners are si, ni, and kay, and their plural counterparts are 
sanday, nanday, and kanday respectively. This paper also highlights the types of deictic pronouns based 
on spatial orientation, that is, proximal, medial, and distal, which can be marked in the absolutive, 
ergative, and oblique cases. Finally, in the Appendix, notes are provided on types of nouns and 
nominalizing prefixes. 
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Appendix A - Notes on Types of Nouns in Kinaray-a 

In this section of the paper, the two classes of Kinaray-a nominals are presented: unaffixed nouns are 
depicted in sub-sections A.1 and A.2, which feature bare nouns and borrowed nouns, respectively, 
whereas section A.3 illustrates how nouns are generated by affixation. 


A.1 Bare nouns 

In this paper, we refer to nouns without affixes as bare nouns. Body part terms are an example of bare 
nouns. The Table 3 lists the following Kinaray-a body part words together with their English equivalents 
will be used as instances of bare nouns. 


Table 3. Body Part Terms in Kinaray-a 


Root Gloss Root Gloss 
talinga ‘ear’ Butkun arm 
siko ‘elbow’ iruk armpit 
mata ‘eye’ likod back 
kiray ‘eyebrow’ lawas body 
pungyahun ‘face’ tul-an bone 
kahig ‘feet’ SUSO breasts 
tudlo ‘finger’ buli buttocks 
buon-buonan ‘fontanel’ bagi-ing cheek 
dahi ‘forehead’ dughan chest 
alima ‘hand’ silang chin 


The utterances 35 and 36 below use kahig (foot) and alima (hand) respectively as examples of body 
part terms as bare nouns. They are unaffixed words and function as the heads of the phrases. 


35. May mga agi kang kahig nga makita sa daray-ahan. 
May mga agi kang kahig nga makita sa daray?ahan. 
exist plu trace det foot lig see-cont obl seashore 


“There are traces of footprints which can be seen in the seashore.’ 


36. Kinahanglan nga limpyo ang alima  bag-o —_magkaun. 
stat-should lig clean abs hand before ipfv-eat 
“(Your) hands should be clean before eating.’ 


A.2 Borrowed nouns 

Spanish and English loanwords make up the majority of Kinaray-a borrowed nouns. This could be due 
to the cultural influence of the Spanish and American occupations in the Philippines, as well as the fact 
that Kinaray-a speakers are bilingual, with English as a second language. Table 4 shows a few examples 
of borrowed nouns. 


Table 4. Kinaray-a Borrowed Nouns 


Borrowed Nouns Origin Gloss 
gobernadora Spanish governor 
domingo Spanish Sunday 
softcopy English softcopy 
telebisyon English television 
refrigerator English refrigerator 
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37. Ginpaathag man kang gobernadora ang sabahin 
pfv-explain also det lady.governor det about 
kang mga kabataan nga nadura Sa listahan 
det plu youth lig misplace-pfv obl list 
kang provincial scholars 
det provincial scholars. 


‘The lady governor also explained about the youth whose names were misplaced in the list of 


provincial scholars.’ 


The Kinaray-a word gobernadora (woman governor) is used in sentence 37. It is derived from the 
Spanish word 'gobernadora,' which is the feminine equivalent to the male 'gobernador.' 
Another Kinaray-a loaned word is used in sentence 38. The word domingo (Sunday) is Spanish 


loanword to refer to a weekday. 


38. Kaduro ka tawo sa tinda kun domingo. 
a.lot det people obl market during Sunday 
“There are a lot people in the market during Sundays.’ 


A.3 Affixed nouns 

Some nouns in Kinaray-a may be formed by affixation. These nominals are derived nouns because they 
are generated by attaching nominal affixes to roots or stems to form derived nouns. Table 5 presents 
three samples of Kinaray-a nominalizing prefixes. 


Table 5: Morphological Derivations of the Kinaray-a Nominals 


Affix Root Gloss Affixed Gloss 
ka- subu sad kasubu sadness 
lapad wide kalapad wideness 
istorya to talk kaistorya person you are taking to 
imaw to accompany with kaimaw companion 
akay to ride kasakay person together on board 
paN- raha cook pangraha used for cooking 
balay house pangbalay used in the house 
lagaw walk panglagaw used for walking 
taga- Antique Antique taga-Antique from Antique 
banwa town proper taga-banwa from the town proper 
butung pull taga-butung person who pulls something 


The prefix ka- is an abstract nominalizer or a reciprocal action when it is attached to the root word subu. 
For instance, the root subu, which means ‘sad’, becomes kasubu ‘sadness’. Meanwhile, when the prefix 
ka- is attached to the root istorya or ‘to talk’, it derives the noun kaistorya ‘person you are talking to’. 


39. Nakabatyag tana ka kasubo 
pfv-feel abs.3s det sadness 
“He/She felt the sadness.’ 
40. Si Lyjoe ang ana kaistorya kang nagaraha sanda. 


Det Lyjoe abs erg.3s person.taking.to when _ pfv-cook abs.3pl 
‘It was Lyjoe who he/she was speaking to when they were cooking.’ 
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The prefix taga- refers to a person’s place of origin. In this case, the prefix is preceded by a particular 
name of place, as in sentence 41. 


41. Mga karay-a ang taga-Antique. 
plu karay-a det ori-Antique 
“The karay-a (speakers) are from Antique.’ 


In a nominalized term, the prefix taga- also denotes the action’s doer. Therefore, taga- acts as an 
initiator in this circumstance, as in sentence 42. When a prefix is added to a verb’s base form, it indicates 
that a person has been assigned or hired to carry out the action represented by the base word. The 
Tagalog equivalent is also taga-. Thus, one can use either prefix, and the meaning is the same. 


42. Kinahanglan ka tagabutong ka hilo sa makina. 
need det ini-pull det thread loc machine 
“A person is needed to pull the thread in the machine.’ 


Appendix B 

1 - 1* person 

2 - 2™4 person 

3 - 3 person 
abs - absolutive 
conj - conjunction 
cont - continuative 
dem - demonstrative 
det - determiner 
dis - distal 

emp - emphatic marker 
erg - ergative 

exist - existential 
gen - genitive 

hon - honorific 

ini - initiator 

ipfv - imperfective 
lig - ligature 

loc - locative 

med - medial 

neg - negative 

obl - oblique 

ori - origin 

plu - plural 

par - particle 

pfv - perfective 
plu - plural marker 
prep - preposition 
prox - proximal 

q - question word 
S - singular 

stat - stative 
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Abstract 

The purpose of this paper is to present the extent of nominalization and its functions in 
Liangmai. The study includes lexical or derivational nominalization and _ structures 
involving the nominalization of clauses. This paper briefly describes the morpho-syntax 
of nominalization and the constructions in which nominalized forms occur in Liangmai. 
The nominalizers in this language include the suffixes -bo, -bam, and -mai and the prefixes 
ka- and pa-. Lexical nominalization includes derivation of abstract nouns from stative root 
verbs, agentive nominals from both nouns and verbs, adjectivals from stative intransitive 
verbs and gerunds. The nominalizing suffix -bo is the most common and is highly 
productive. All verbal roots can be nominalized by the suffix -bo, and the resulting forms 
can be interpreted as abstract nouns, gerunds, attributive adjectives, relative clause, among 
others. In Liangmai, the suffix -bo functions variously as a nominalizer, relativizer or 
complementizer. Liangmai exhibits externally headed relative clauses in which the head 
nouns appear to the right of relative clauses. 


Keywords: Liangmai, Nominalization, Relativization, Complementation, Derivational, 
Clausal, Tibeto-Burman 
ISO 639-3 codes:njn 


1 Introduction 

Nominalization is one of the most prominent characteristics and a highly productive phenomenon of 
the Tibeto-Burman (henceforth TB) languages (Bickel 1999, Watters 2006, 2008, Genetti et al. 2008, 
2011, Noonan 1997, Delancey 2002, 2011). It refers to the process by which we derive nominals from 
a word of another class. Yap et al. observe that: 


....nominalization in its core sense refers to the process by which we derive nominal expressions — 
for example, from verbs (e.g., watch > watcher) or adjectives (e.g., narrow > narrowness, narrowing). 
Clauses may also be nominalized (e.g., awaken the public conscience > awakening (of) the public 
conscience. (Yap et al. (2011:3) 


The study of nominalization in TB languages begins with Matisoff's seminal paper ‘Lahu 
nominalization, genitivization, and relativization’ (1972). He showed that the functions of 
nominalization, relativization and genitivization in Lahu are marked by the same particle ve. A similar 
complex of functions revolving around a single morpheme occurs in other TB languages (Delancey 
2002:56). This morphological convergence of syntactic functions was dubbed the ‘Standard Sino- 
Tibetan Nominalization’ (SSTN) pattern (Bickel 1999:271) and has been reported in many studies of a 
number of TB languages. Two levels of nominalization process are observed: lexical or derivational 
and clausal. Derivational nominalization refers to the process which creates lexical nouns from words 
of other lexical categories (usually the verb root), while clausal nominalization operates in the domain 
of clause and works at syntactic level to allow a grammatical clause to be treated as a noun phrase 
within a broader syntactic context (Genetti et al. 2008). Nominalized clauses are used in a wide range 
of functions and syntactic structures in TB languages, including attributive clauses, adverbial clauses, 
nominal-complement constructions, relative clauses, free-standing independent clauses, and so on. 
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Both these nominalization processes are attested in Liangmai, a TB language spoken in Manipur 
and Nagaland. The primary purpose of this paper is to present a brief overview of nominalization in 
Liangmai. Different nominalizers found in the language are -bo, -bam, -mai, ka- and pa-. The uses and 
functions of these nominalizers in the language are discussed in the following sections. 


2 Language Classification and Typological Overview 

In the classification of Tibeto-Burman languages in the Linguistic Survey of India (LSD, Grierson 
(1903) was slightly hesitant as to whether Liangmai (called ‘Kwoireng or Liyang’) be placed under 
Naga-Bodo or Naga-Kuki sub-group. He stated: 


Their language appears to be an intermediate one between the Naga-Bodo and the Naga-Kuki group. 
The pronouns agree best with the latter, and so I class it here, though its geographical position would 
incline one to put it with the former set of languages (Grierson 1903:462). 


Other than some lexicons listed under ‘Kwoireng or Liyang’, no description of the language was 
provided in LSI. Marrison (1967), in his comprehensive survey of the languages of Northeast India, 
placed Liangmai in his posited Nruanghmei group, ‘Type B-3’, along with Zeme, Mzieme, Nruanghmei 
(formerly Kabui), Khoirao, Maram and Puiron. The close affinity among the Zeme, Liangmai and 
Rongmei is also reflected in Bradley’s (1997) classification of Tibeto-Burman languages in which 
Liangmai was placed under the Zeliangrong group, which falls under the Southern Naga of the Kuki- 
Chin-Naga. More recently, Burling (2003) grouped Liangmai, along with Zeme and Rongmei, under 
the Zeme group, which was again referred to as Western Naga in Post and Burling (2017). 

Two major word classes in Liangmai include nouns and verbs. Most nouns are free roots, and they 
can be monosyllabic, disyllabic or trisyllabic. Nouns with more than three syllables are mostly 
compounds. There are also bound noun roots, and these take one of the four formatives ma-, ta-, tsa- 
and ka- to become free standing forms. These formatives may or may not be retained in compound 
formation; however, it is usually dropped when a noun is used with pronominal personal pronouns. The 
semantics of these formatives is not yet fully known. All verbal roots are bound, and they can be free 
standing words only if they are minimally affixed (Daimai & Singha 2020:127). There are both 
monosyllabic and disyllabic verb roots, but trisyllabic roots are rare. Verb roots are also used to derive 
adverbs and adjectivals. A verb is inflected for tense, aspect and mood. Liangmai is a tonal language 
having at least three contrastive lexical tones, namely high, mid and low. The language is highly 
agglutinative. 

As with most TB languages, Liangmai is verb-final. The A argument of a transitive clause is 
typically marked with agentive -niu, though it is not obligatory, whereas the animate O argument is 
marked with a primary object marker -tv. The inanimate O remains unmarked. OAV word order is also 
permissible, and such word order is used when it involves emphasis on the O argument. Noun phrases 
are head-medial with variable ordering of modifiers. They consist of an obligatory noun head which is 
preceded and followed by optional modifiers. Adjectivals can occur in both pre- and post-head 
positions. Demonstratives occur in pre-head position, while numerals occur in post-head position. To 
signal possession, personal pronoun pro-clitics are prefixed to the root of a noun, as in alu (1-POSS 
farm) ‘my farm’ and nalu (2-POSS farm) ‘your field’. The possessor noun phrase can also be marked 
with the possessive enclitic -gu, suffixed to the possessor of the head noun, as in a-gu tsalu (1-POSS 
farm) ‘my farm’ and na-gu tsalu (2-POSS farm) ‘your farm’. The possessive enclitic -ga marks the 
possessor of noun phrase i ‘Tl’ and nay ‘you’, which precedes the head noun tsalu ‘farm’. The dependent 
clause generally precedes the matrix clause. 


3 Data and Methodology 

According to the Ethnologue (Eberhard et al., 2021), the 2011 census estimates that there are 49,800 
speakers of Liangmai. The Liangmai Naga Council, Manipur (LNC, M), an apex organization of the 
tribe, also gives a similar figure of 50,000. The majority of Liangmai speakers can be found in the 
bordering area of Tamenglong district of Manipur and Peren district of Nagaland. There are 
approximately 100 Liangmai villages, most of which are in the state of Manipur. Each Liangmai village 
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has a distinct variety, but there is intelligibility to a great extent among the speakers of different villages. 
There is not much of a media platform for Liangmai, and Liangmai literature is still in an early stage of 
development. There is a Bible and hymnals translated into Liangmai, and the language used in the Bible 
serves as the standard form when it comes to writing and literature. Nevertheless, the influence of each 
own variety can be seen in the few books written so far by Liangmai authors. 

For the current study, as a native speaker, I have collected most of the data from introspection and 
elicitation. A list of words adapted from Swadesh list, expressions used in daily life, and basic phrases 
and sentences in Liangmai elicited from my principal informant, Phenlakbou Marenmai, aged 67, of 
Tharon village, Manipur, in 2018-2019 were used for the analysis. In addition, I have also extracted 
relevant data from available written Liangmai sources, mainly the Holy Bible. I also refer to my old 
field notes for relevant data. 


4 Derivational or Lexical Nominalization in Liangmai 

Liangmai has multiple nominalizers that are involved in the derivation of nouns. A lexical noun is 
derived mainly from verbs and sometimes even from nouns. Deverbal nouns are highly productive and 
are widely attested in normal discourse and speech acts, and they can heads NPs in Liangmai. The 
nominalizing suffixes in Liangmai are -bo, -bam and -mai. The language also has a nominalizing prefix 
ka- and pa-. These nominalizers are discussed in detail in the following sections. 


4.1. Nominalizer -bo 

The nominalizing suffix -bo is multifunctional. It is used as a general nominalizer in Liangmai. It is 
highly productive and can be suffixed to all verb roots to derive deverbal nouns that can head an NP. 
When this nominalizer is attached directly to a verb, the resulting form, taken out of context, may have 
several possible interpretations and functions. It may be used to derive abstract nouns, for example, the 
verb root piy means ‘be afraid’, whereas pinbo means ‘fear’, as in (1). 


(1) pa pinbo ha-e 
38 afraid-NMZ N.COP-DECL 


‘He has no fear.’ 


Similarly, the following abstract nouns are derived by suffixing nominalizer -bo to stative root verbs. 


(2) (a) lunsa ‘to love’ > lunsabo ‘love’ 
(b) sal ‘to die’ > saibo ‘death’ 
(c) masan ‘be clean’ > masanbo ‘cleanliness or holy’ 
(d) kim ‘to satisfy’ > kimbo ‘satisfaction’ 
(e) mat'a ‘be happy’ > mat'abo ‘happiness’ 
(f) hu ‘be brave’ > hubo ‘courage’ 
(g) tsaliay ‘be proud’ > tsaliaybo ‘pride’ 


Derived abstract nouns can function as heads of noun phrases, as given in (3). 
(3) ... lunsa-bo si-niu di-t'u-e 


love-NMZ EMP-AGT big-SUP-DECL 
‘...love is the greatest.’ 
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Action nominals can be formed by suffixing -bo to action verbs. 


(4) (a) kap-bo ‘crying’ 
(b) tiu-bo ‘eating’ 
(c) pak-bo ‘running’ 
(d) ken-bo ‘reading’ 
(e) rao-bo ‘writing’ 
(g) zi-bo ‘sleeping’ 
(h) mattenbo ‘playing’ 
(i) mak*iubo ‘coughing’ 
Qj) kariabo “squeezing” 


Nominalized action verbs can function as noun modifiers, as given in (5), and can also function as heads 


of noun phrases, as in (6). 


(5) nay-niu rao-bo ariak 
2P-AGT write-NMZ book 
‘Where is the book you wrote?’ 

(6) pak-bo-niu aliu-ley 
run-NMZ-AGT 1PL-BEN 


‘Running will be good for us.’ 


Attributes such as dimension, age, value and 


delam bam lo 
where  EXST IMP 
Wi ne 

good IRR 


color are expressed by nominalized stative intransitive 


verbs when used as noun phrase attributes, or as verbless clause complements in ascriptive clauses. 
These are derived from stative intransitive verbs using the nominalizing suffix -bo; t'eybo be.long-NMZ 


‘long’, wibo be.good-NMZ ‘good’, and others. 


(7) (a) siam ‘be small’ > 
(b) di ‘be big’ > 
(c) hen ‘be red’ > 
(d) san ‘be new’ > 
(e) kha ‘be bitter’? > 
(f) rai ‘be first’ > 

(8) ariak san-bo 
book be.new-NMZ 
‘A new book.’ 

(9) tsaki di-bo 
house — be.big-NMZ 
“A big house.’ 

(10) apui tsap"ai kek-bo 
1.mother cloth tear-NMZ 


‘My mother is stitching torn cloth.’ 


siambo ‘small’ 
dibo ‘big’ 
heybo ‘red’ 
sanbo ‘new’ 
ktabo ‘bitter’ 
raibo ‘first’ 


tsarui-bam-e 
stitch-PROG-DECL 
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Gerunds are also formed by adding the nominalizing suffix -bo to verbs. 


(11) = _zao sak-bo tsapum-ley sa-e 
wine drink-NMZ body-BEN bad-DECL 
‘Drinking is injurious to health’ (Lit: Drinking wine is bad for body). 


(12) mat*iu-niu wi pui kap-bo you-e 
everyone-AGT DEM woman cry-NMZ see-DECL 
“Everyone sees/saw that woman crying.’ 


Additionally, -bo can be used to nominalize clauses as well. Clausal nominalization is discussed in 
Section 5. 


4.2. Nominalizer -bam 

The nominalizer -bam derives a locational noun from action verbs. The derivation expresses a meaning 
of ‘the place where VERB’. It is a highly productive process applying to both transitive and intransitive 
verb roots. A few examples of locational nominalized forms are given below. 


(13) = (a) Zuy ‘to pee’ > zun-bam “place to pee’ 
(b) kahum ‘to pray’ > — kahum-bam ‘place to pray’ 
(c) alay ‘to cook’ > alay-bam “place to cook’ 
(d) tiu “to eat’ > tiu-bam “place to eat’ 
(e) Zi ‘tosleep? > zi-bam “place to sleep’ 
(14) = zuy-bam delam lo 
pee-NMZ where IMP 


“Where is the place to pee?’ or ‘Where is the peeing place?’ 


(15) namai-dun-tu kahum-bam-ga _ _ pi-gut tu lo 
child-PL-DAT pray-NMZ-LOC CAUS-enter PROH IMP 
‘Do not let children enter the place of praying.’ 


4.3. Nominalizer -mai 

In Liangmai, nouns and verbs can be nominalized by the agentive nominalizer -mai to derive an agentive 
noun. This nominalizer is derived from the noun ¢samai meaning ‘man or person’. Its function is similar 
with the Mongsen Ao agentive nominalizer -e1 (Coupe 2007:263). This derivation usually denotes 
either a type of agent or a referent whose habitual activity is characterized by the meaning denoted by 
the nominal base in the case of nominalization of noun (literally: the one who has to do with noun) as 
in (16a) and (16b). In the case of nominalization of a verbal base, it denotes the meaning ‘the one who 
does (verb)’ as in (16c) to (16f). This type of nominalization is restricted to nouns with human referents. 


(16) = (a) nam-mai 
village-NMZ 
‘villager’ (Lit.: the one in or from village) 
(b) tsari-mai 
war-NMZ 
‘warrior’ 
(c) ken-mai 
to read-NMZ 
‘reader’ 
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(16) = (d) mat'en-mai 

to play-NMZ 
‘player’ 

(e) kamsat-mai 
to kill-NMZ 
‘killer’ 

(f) alayn-mai 
to cook-NMZ 
‘chef/cook’ 


The scope of this nominalizer includes nominalization of clauses, as given in (17) and (18). 


(17) — ariak-ki tat-mai-dun-tu tattuan pit ne 
school gO-NMZ-PL-DAT prize give IRR 
‘The ones who go to school will be given prizes.’ 


(18) = nan-niu si-mai SOU-SOU lo 
2SG-AGT know-NMZ who-who IMP 
“Who are the ones you know?’ 


4.4. The nominalizing prefix ka- 
The nominalizing prefix ka- is found to be attached on a handful of verbs creating abstract nouns. 


(19) (a) Sal ‘die’ > ka-sai_ — ‘death’ 
(b) thiu ‘pain’ > ka-t'tu ‘boil, furuncle’ 
(c) tat ‘go’ > ka-tat ‘journey, mission’ 
(d) tiu ‘eat’ > ka-tiu — ‘eatables’ 
(e) sak ‘drink’ > ka-sak ‘drinks (liquid food)’ 


Konnerth (2014) reports a nominalizer ke- (with allomorphs ki ~ ka) in Karbi. She writes, ‘this 
nominalizing velar prefix has many apparent cognates across several branches of Tibeto-Burman both 
inside and outside Northeast India, which is productive in deriving nouns from verbs’ (Konnerth 
2014:384). The kV- prefix in Karbi is productive in deriving nouns from verbs, but in Liangmai this 
strategy is not fully productive and cannot be used to derive nouns from all verbs. However, it is possible 
that the nominalizer ka- in Liangmai is cognate with the nominalizing velar prefixes found in Tibeto- 
Burman languages of different branches spoken in Northeast India (Matisoff 2003; Konnerth 2012, 
2014). 

Another function of ka- in Liangmai is to appear with deverbal nouns when occurring as attributes 
of head nouns. This prefix functions in a way similar to the attributive derivational prefix a-, which is 
used to derive adjectives from verbs in Meiteilon (Chelliah 1997:86; Singh 2000:114) and an attributive 
prefix a- which frequently appears on the head noun in adjective constructions in Karbi (Konnerth 
2011:121). For example, ka- is prefixed to wibo ‘good’ to derived kawibo ‘good’ as in kawibo tinmik ‘a 
good day’, where tiymik is ‘day’. 
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4.5. The nominalizing prefix pa- 
When used with stative verb roots, the resulting words denote abstract entities. 


(20) = (a) t'ey “be long’ > pa-t'éy ‘length’ 
(b) di ‘be big’ > pa-di “breadth’ 
(c) suk “be deep’ > pa-suk ‘depth’ 
(d) ku ‘be tall’ > pa-ku ‘height’ 
(e) rit ‘be heavy? > pa-rit ‘weight’ 
(21) — —pui-piu-gu lunsiat pa-suk madat si-lak-e 
mother-father-GEN love NMZ-be deep measure know-NEG-DECL 


‘The depth of parent’s love cannot be measured,’ 


When attached to any action-oriented roots, it indicates ‘the manner of or ‘the way of’. Note that the 
root morphs in nominalized forms have a rising tone, as those in (22). 


(22) (a) tat ‘walk’ > pa-tat ‘the way of his/her walking’ 
(b) zi ‘sleep’ > pa-z ‘the way of his/her sleeping’ 
(c) tiu ‘eat’ > pa-tiu ‘the way of his/her eating’ 
(d) kap ‘cry’ > pa-kap ‘the way of his/her crying’ 
(e) malai ‘move’ > pa-malai ‘the way of his/her moving’ 


5 Clausal Nominalization 

Nominalization is a major tool for creating various types of syntactic structures, such as attributive 
phrases, nominal complement clauses and relative clauses. Each nominalized clause is discussed as 
follows. 


5.1. Attributive phrases 
Liangmai employs attributive phrases in which the root of the verb is affixed by the nominalizing suffix 
-bo, as illustrated in (23) and (24). 


(23)  tsalat dinsi-bo piu 
language speak.know-NMZ man 
“A good orator (Lit. the man who can speak well).’ 


(24) = anati-niu liu-bo tsaki 
1DU-AGT buy-NMZ house 
‘The house that we (both) bought.’ 


The phrases with these nominalized forms modify the head noun. 


5.2. Nominal complement clauses 
Nominal complement clauses are also derived with -bo. A nominal complement clause is embedded 
directly into a noun phrase without any further alternation of its structure. 


(25) — tsapuan kep-bo mari 


elephant shoot-NMZ story 
‘The story of the shooting of an elephant.’ 
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Such constructions also function as noun phrase complements of verbs as illustrated in (26). 


(26)  pd-niu —manin-bo i si-e€ 
3-AGT  think-NMZ I know-DECL 
‘I know what he is thinking.’ 


5.3. Relative clauses 
Like in many other TB languages, nominalization is the main strategy to form relative clauses in 
Liangmai. The general nominalizer -bo provides the principal means of nominalizing the verb stem of 
relative clauses. Relative clauses functioning as noun phrase modifiers occur before the heads. The 
elicited (27) to (29) demonstrate the pre-head positions of relative clauses in verbless clauses and verb 
clauses respectively. 


(27) — [a-tu’ dap-bo| piu Si 
1-DAT beat-NMZ man DEF 
‘The man who beat me’ 


(28) — [zao sak-bam-bo] piu Si akina-e 
wine drink-NMZ man DEF 1.younger.sibling-DECL 
‘The man drinking wine is my younger brother.’ 


(29) [danai_ i-niu liu-bo| ariak si ken wi-e 
yesterdayI-AGT — buy-NMZ book DEF read good-DECL 
‘The book which I bought yesterday is good (to read).’ 


AS seen in examples (27) to 29), relative clauses in Liangmai are distinguished by the presence of a 
definite article si after the head noun, following the nominalized clause. Heads nouns in Liangmai 
relative constructions commonly appear to the right of the relative clause; however, they may also be 
clause-internal, as in (30). 


(30) = maipiu-niu ariak — mazenna-ley pi-bo Si 
man-AGT book orphan-BEN give-NMZ DEF 
‘The book the man gave to the orphan.’ 


Left-headed and headless relatives in Liangmai are also found in data elicitation. For example, the 
construction in (31) is ambiguous, depending on whether the relative head is taken to be the overt 
nominal ariak ‘book’, or whether the clause is interpreted as headless (‘the one’). 


(31) — ariak [liu-bo] si or [ariak liu-bo]| © si 
book buy-NMZ DEF book buy-NMZ DEF 
‘The book that was bought’ or ‘The one who bought the book’ 


Relative clause formed with agentive nominalizer -mai is also found in the language and such clauses 
are also found in the pre-head position. 


(32) = malum-mak-mai tsapiu kep pin-e 


believe-NEG-NMZ medicine — shoot scare-DECL 
‘The one that didn’t believe is scared to take injection’ 
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5.4. Sentential Complements 

Another function of Liangmai -bo is the nominalization of sentential complements. The nominalizer -bo 
is suffixed to the predicate of the complement clauses. The same construction is employed regardless 
of whether the complement clause functions as a subject (33) or as an object (34) of the matrix clause. 


(33) [tsami  tiu kin-bo] tsapum-ley wi-mak-e 
meat eat a lot-NMZ body-BEN good-NEG-DECL 
‘(Eating too much meat] is not good for body/health.’ 


(34) — [pa-niu din-bo| i malum-e 
3SG-AGT say-NMZ I believe-DECL 
‘I believe [in what he said].’ 


6 Conclusion 

This paper briefly discussed different nominalizers and their functions in Liangmai. Like other Tibeto- 
Burman languages of the Himalayan region, Liangmai makes use of derivational (i.e., lexical) and 
clausal nominalization at the morphological and syntactic levels of grammar. We noticed that 
prefixation is limited to the process of lexical derivation whereas suffixation is used in both processes 
of lexical and clausal nominalization. The nominalizer -bo is the most common and most productive 
nominalizer. It is used to derive abstract nouns, action nominals, adjectivals and gerunds. In clausal 
nominalization, it is employed extensively in nominalized clauses, attributive phrases, complement 
clauses and relative clauses. The nominalizers -bam and -mai are used to derive locational and agentive 
nouns respectively. Agentive nominalization is also found to derive relative clauses, though not as 
productively as the general nominalizer -bo. The nominalizing prefixes ka- and pa- are found to occur 
only in the derivational process. 


Abbreviations 

1 first person 

2 second person 
3 third person 
AGT agentive 

BEN benefactive 
CAUS causative 
DAT dative 

DECL declarative 
DEF definite 

DEM demonstrative 
DU dual 

EMP emphatic 
EXST existential 
GEN genitive 

IMP imperative 
IRR irrealis 

LOC locative 
N.COP negative copula 
NEG negative 
NMZ nominalizer 
PERF perfective 

PL plural 


POSS possessive 
PROG progressive 
PROH prohibitive 
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SG singular 
SUP superlative 
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Abstract 

In the Khomic group within Kuki-Chin, there have been conflicting representations of 
whether all morphemes bear tone, or whether tonelessness is related to syllable type (Herr 
2011; Hornéy 2012; Peterson 2019a). While verb roots in these languages can be 
monosyllabic or sesquisyllabic, they are always bound by affixes and clitics. This study 
offers a fine-grained analysis of tone variation on various syllable types and morphemes. 
I examine verbs in Kanise Khumi using archived word list data (Bryant 2020). In addition 
to pitch specification in various contexts, I focus on the durability of voice quality cues 
that are associated with Kanise tones (Ikeda 2021). Contextual variation appears to accord 
with acoustic studies of minor syllables that have shown that not all minor syllables are 
created equal (Butler 2014), and tonal specification is sensitive to whether the syllable is 
part of a compound, part of the lexical root, or a functional morpheme (West 2014). 


Keywords: tones, verbs, morphophonemics, phonology, phonetics, language description, 
acoustics 
ISO 639-3 codes: cek 


1 Introduction 

This paper focuses on surface tone patterns of citation verbs in Kanise Khumi. My underlying 
assumption is that tone in this language cannot be adequately analyzed without reference to the 
morphological and prosodic structure of words. However, defining these structures is itself no simple 
task (Dixon & Aikhenvald 2002). In the beginning, we can only build initial hypotheses about these 
structures and their tonal patterns based on descriptions of closely related varieties within a language 
cluster or subgroup. We then test those hypotheses with data from a specific language. 

Kanise Khumi is spoken mainly in villages near Sami town, Paletwa Township, Chin State, 
Myanmar. The number of speakers is unknown, but an internal census based on ethnicity has counted 
5,776 Kanise Khumi (Bryant, p.c.). Kanise Khumi has been previously known by the names of some 
of the larger clans, such as Nideun (Eberhard, Simons & Fennig 2020), Tahaensae (So-Hartmann 1988), 
or Uiphaw. It is associated with the ISO language code [cek], a label based on lexical similarity, not 
shared sense of identity. Peterson’s (2017; 2019a; 2012) proposed classification of Kuki-Chin is based 
on a more detailed analysis of the Khumi cluster of languages. He situates the Khumi cluster in the 
Southwestern group within a peripheral branch of Southwest Tibeto-Burman (=Kuki-Chin). Other 
languages in the cluster include those spoken in geographically adjacent areas such as Lemi Chin and 
Mro Khimi as well as those spoken further away in the Chittagong Hill Tracts of Bangladesh such as 
Bangladesh Khumi and Rengmitca. 

An analysis of tone in a Southwest Tibeto-Burman language must take into account several well- 
established premises. One is that tonal variation can mark morphosyntactic relations, and even be the 
sole morphological marker (Hyman 2007; Peterson 2019a; Henderson 1967; Coupe 2007). In addition, 
the syllable is often the tone bearing unit (Lotven et al. 2020; Hyman & VanBik 2002; VanBik 2006; 
Peterson 2019a). However, not all syllables in these languages must bear tone, especially affixes and 
clitics (Hyman 2006; Hornéy 2012; Hyman & VanBik 2004). Aside from morphosyntax, tonality can 
also be dependent on syllable type (e.g., toneless minor syllables vs. tonal major syllables (Herr 2011; 
Hornéy 2012; Hyman & VanBik 2002)). Furthermore, some tones may be constrained by context, for 
example, contour tones restricted to word-final position or prohibitions against adjacent tones (Hyman 
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2006). At the same time, contextual effects can be limited to specific tones (So-Hartmann 1989; Sarmah, 
Dihingia & Lalhminghlui 2015). Finally, tone is not synonymous with pitch. In many languages in 
Southeast Asia, tone is signaled through a bundle of phonetic features, including pitch height, pitch 
shape, duration, and voice quality (Brunelle & Kirby 2016; Henderson 1965). 

Descriptions of Khomic languages differ with respect to tone on verbal affixes and clitics. Herr’s 
(2011) aim is to interpret the phonological status of minor syllables in Lemi Chin, not analyze tonal 
variation. She treats all minor syllables as phonologically toneless and all major syllables as tone 
bearing. In contrast, Hornéy (2012) does not base tonelessness on syllable structure, but rather on 
morphological structure. In her phonological analysis of Mro Khimi, she analyzes tone melodies of 
verbs. She describes verb “prefixes” and “suffixes” as underlyingly toneless, regardless of whether they 
are minor or major syllables. Using a lexical phonology approach (Kiparsky 1982), she posits that the 
surface tone on the “clause final particle” is derived through melody association and spreading from the 
verb root. The surface tone of “prefixes” is derived through processes that include spreading, polar tone 
assignment, and lowering. Both of these studies are based primarily on word list data. In comparison, 
Peterson’s analyses of Bangladesh Khumi are constructed from a large corpus that includes long texts 
and elicited data. He has paid extensive attention to tonal variation in diverse morphosyntactic 
environments. In Peterson (2019a), syllables are specified with an underlying tone; however, the “half- 
syllables” found in sesquisyllabic forms are treated as toneless. When he makes a claim about the tone 
of a major syllable that is an affix or clitic (2013; 2019a), it appears to be based on the behavior of that 
specific marker, rather than a general property of all preverbal elements or all postverbal elements. 

Thus, it makes sense to begin analyzing tone patterns on Kanise Khumi verbs with the following 
questions in mind. Do the tones on affixes/clitics appear to be consistent or variable? Do the tones on 
Kanise Khumi affixes/clitics appear to be sensitive to the tone of the verb root? Do any Kanise Khumi 
minor syllables appear to be specified for tone? Following Peterson, this paper takes a morpheme- 
specific approach. I focus on the morphemes that appear most frequently on citation verbs in Kanise 
Khumi word list recordings. First, we look at the Kanise postverbal marker -ta which behaves more like 
Lemi -te than Mro Khimi -de in terms of tone patterns. The rest of the paper is devoted to prefixes. With 
the valency-affecting prefixes p-, t-, and a(y)-, syllable structure appears to differentiate tonal patterns 
as in Lemi. A similar argument could be made for the adjectivizers k- and ka(y)-. In contrast, the tone 
patterns with the highly frequent 3"-person singular participant marker, ?3-, resist a simple distinction 
between toneless minor syllables and tone-bearing major syllables. An argument based on polar tone 
assignment of prefixes also does not seem appropriate. As such, the tone pattern data raises questions 
about how to define or categorize minor syllables in Kanise Khumi and about morpheme productivity. 


2 The Data 

The data for this paper comes from an elicited word list that includes 2,076 items (Bryant 2020). The 
.wav files and Excel database were downloaded from Zenodo. The word list is based on the EFEO- 
CNRS-SOAS Word List for Linguistic Fieldwork in Southeast Asia (Pain et al. 2019). Most of the 
recordings in the list are from one male speaker in his 50s. About 193 of the items were recorded with 
a younger male speaker. Some of these items are duplicates of those recorded by the main speaker. Both 
men are multilingual. The word list was elicited using Burmese glosses. During elicitation, Bryant 
transcribed the words phonetically and the speaker wrote the words in the current orthography. Three 
tokens were recorded for each item. The speaker was also asked to parse the response into what he 
perceived as words wherever possible. When he did so, he pronounced the entire item three times, then 
three tokens of each separate “word.” The main speaker generally used very slow, careful speech. 


2.1 Syllable structure 
Most Kanise Khumi syllables have CV structure. Table 1 shows attested syllable structures. 
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Table 1: Kanise Khumi syllable types 


CV 261 ‘house’ kod.dgd ‘to boil’ (intransitive) 
CVV suil ‘gold’ to1.?aul ‘to yell, speak loudly’ 
CCV am 1.pra\ ‘uncle’ ?amt.pred ‘to divorce’ 

CCVV _ tal.pratl ‘cemetery, graveyard’ tsal.p*rail tat ‘to read’ 
CVN.CV © nini.nil ‘they two’ ?ami.bau1 _— ‘to lie prone, on one’s stomach 
N.CV ni.tsod ‘skewer’ m.beN jat ‘to give birth’ 


Simple onsets are shown in Table 2. Quite a few allophones occur with fricatives. The affricate /ts/ may 
freely be pronounced [s]. The palatal fricative is variably realized as [j ~j~]. The velar fricative /y/ also 
varies in manner of articulation, [y ~uj~g]. Permissible complex onsets included /kl, ka, k*1, pr, p1, mj/. 
The main speaker exhibits free variation between [kl] and [tl], but the younger speaker produced only 
[kl]. Coda nasals are permitted in non-final syllables. In word-final position, oral closure is rarely heard, 
and most often the corresponding vowel is nasalized. Glottalization in codas is considered a feature of 
tone. Syllabic nasals occur in non-final position only. Minor syllable onsets include /m, p, n, t, d, ts, S, 
k, ?/ with /p, t, k, ?/ occurring most frequently (Baleno 2020). 


Table 2: Kanise Khumi simple onsets 


ppb ttd kk ? 
mm ng ny 
(f) v s ts j y h 
(w) ldag 

tl 


Kanise Khumi has nine simple vowels as shown in Table 3. All simple vowels occur with oral and nasal 
phonemes. Complex vowels include /ai, au, ui, oe/. There are no examples of nasalized /ui/ or /oe/. The 
open-mid vowels are usually realized as [e°], [3°], and [9°] with a vowel height transition during the 
vowel. As with Lemi Chin (Herr 2011) and VanBik’s Khumi (2006) vowel length does not appear to 
be contrastive. Bryant often transcribed the mid-central vowel in minor syllables as [9]; however, Tan 
(2021) finds no significant difference between [9] and [3]. 


Table 3: Kanise Khumi simple vowels 


ii ut 

é 00 

€ 33 ee) 
aa 


2.2 Phonetic tones 

Tone appears to bear a low functional load in terms of lexical distinctions. Kanise Khumi speakers 
report three phonetic tones (Bryant, p.c.). Analyzing monosyllabic nouns, Ikeda (2021) described these 
as (1) high level modal, (2) falling low breathy, and (3) short mid glottalized. 

Table 4 provides visualizations of the tone contours of the three phonetic tones in similar 
environments. In Table 4, the verb root is preceded by a complement noun (object or instrument). In 
isolation each complement noun bears Tone 1, the high-level modal tone. The vertical red lines bisect 
the verb roots. In Table 5, the root is prefixed with the adjectivizer k-. The prefix is a mid or neutral 
pitch See section 3.3. 
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Table 4: Kanise Khumi phonetic tones on verb roots preceded by a noun 


Tone | (H) Tone 2 (F) Tone 3 (M) 


6.692837 


§.494061 7.062013 


| 
15.86 styo 15.86 stio9 15.86 Styo9 


7 atten 7.834 Styo9 meat 
i 14.276 styoo ~ 4.143 stio9 
-4.98 stiog -4.98 sto -4.98 stio9 
0.249542 _ 0.578435 | 0.222003 
5.743603 1.915501 7.640448 10.615927 | 6.914840 0.324910 
[tui] né:1] ‘drink water’ [p3:1 rit] ‘to draw in a casting [?é:1 sad] ‘to build a house’ 


net’ 


Table 5. Kanise Khumi phonetic tones on verb roots preceded by adjectivizer k- 


Tone | (H) Tone 2 (F) Tone 3 (M) 


4.892264 


4.819593 


8.866634 


15.86 Stroo : 15.86 styo9 15.86 Stio9 
F at ee 19.335 stio9 
\e . ee 5.531 styoo 
: ae 12.892 styo9 
4.98 styoo 4.98 stig 4.98 styo9 
| o6s2328 | | os6ess_| |___ 0.366516 

9.548962 2.515017 5.256248 1.095856 5.258780, 1.169491 

[kal pré:1] ‘dense, thick (of [ko joN] ‘ticklish, sensitive’ [koJ led] ‘big’ 

forests)’ 


The six tone combinations presented in Tables 4 and 5 are the most frequent in sesquisyllabic and 
disyllabic verbs. Table 6 shows all the combinations that are found in citation verbs in the word list. 
There appears to be no rule prohibiting sequences of like tones within a compound. Second, although 
Tone 2 occurs less frequently, it can precede or follow Tone 1 within a compound. In non-final position, 
Tone 2 does not fall as steeply or as low as the Tone 2 observed in monosyllables or final syllables. 
However, the initial pitch height and the breathy voice quality are similar. Finally, the three empty cells 
in Table 6 show sequences that are not observed in citation verbs. These sequences are attested in 
trisyllables where -ta follows the verb stem and in noun compounds. 
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Table 6: Adjacent tones in Kanise Khumi 


[HH]! [tuil né1] [HF] [p3:1 ci] [HM] [?é:1 sad] 
‘to drink water’ ‘to draw in a casting net’ ‘to build a house’ 
[MH] [rai nel] [MF] [MM] [mail tso4] 
‘to drink alcohol’ ‘to be old’ (age) 
[FH] [luv p*i:1] ‘bald’ [FF] [FM] 


There was also a sequence with Tone 2 followed by a high glottalized final syllable in [téV 23?1 ] ‘false’. 
This presents the possibility of a high glottalized tone. However, the syllable also begins with a glottal 
stop, which seems to condition glottalization of the rhyme in many other instances. At this point, it is 
unclear whether this represents a fourth phonetic tone. 


2.3 Structure of citation verbs 

In the Kanise Khumi word list, citation verbs never appear as isolated monosyllables. In fact, isolated 
monosyllables are quite rare in the word list, with only fifty-six nouns, two pronouns, and two numbers. 
The simplest forms of verbs given in the word list are isolation forms that are composed of two syllables, 
either sesquisyllabic or disyllabic. There are 283 of these forms. This number includes a few duplicates 
where (a) related semantic ideas yielded the same form or (b) the same word was elicited from both 
speakers. All duplicates were included in this paper because they provide evidence relevant to tonal 
variation. Three types of structures are observed: arguably dimorphemic sesquisyllables, arguably 
dimorphemic disyllables, and arguably monomorphemic disyllables. 


(1) Arguably dimorphemic sesquisyllables 


a. t-root: [tod Paul] ‘to yell, speak loudly’ (n=21) 
b. p-root: [pot ni] ‘to forget’ (n=15) 
c. k-root: [ko4 dod] ‘to boil’ (intransitive) (n=9) 
d. s-root: [sod guil] ‘to follow’ (n=5) 


(2) Arguably dimorphemic disyllables 


a. 23-root [?31 no: 1] ‘to see’ (n=109) 

b. a-root [?al bil] ‘to hide oneself? (n=49) 

Cc. ay-root [?am1 pré1] ‘to run away, flee’ (n=33) 

d. ka(y)-root [kad kau] ‘empty’ (n=15) 

e. Noun — Verb [tuil né:1] ‘drink water’ (n=10) 

f. Verbal Compound [kud nau] ‘to tame’ (n=13) 
(3) Arguably monomorphemic disyllables (n =4) 

a. [ded tla] “bland (flavor)’ 

b. [koJ leT] ‘to tickle’ 

c. [tam] pgJ] ‘thin’ 


It is important to note that some of these forms can be combined as trisyllables. In (4), ?3- and a-precede 
all four of the arguably dimorphemic sesquisyllable types. 


' HH = high-high, MH = mid-high, FH = falling-high, HF = high-falling, MF = mid-falling, FF = falling-falling, 
HM = high-mid, MM = mid-mid, FM = falling-mid 


149 


Papers from SEALS 30 — Song and Nguyen 


(4) Arguably trimorphemic trisyllables 


a. 23-f-root: [?31 tod he] ‘to stir up (water)’ 

b. ?3-p-root: [?31 pod nH] ‘to smell’ 

c. ?3-k-root: [?31 kod tsiJ] ‘to polish’ 

d. 23-s-root: [?31 sg ?e1] ‘to nip, to pinch with one’s fingers’ 

e. d-t-root: [?al tod mol] ‘to dry yourself (with a towel)’ 

f. a-p-root: [?al pod ted] ‘to tease’ (parsed as a p-root) 
g. a-k-root: [?alkoitsul] ‘to gather, assemble’ 

h. a-s-root: [?al sod led] ‘to dive’ 


There is a postverbal morpheme, -ta that occurs frequently with citation verbs (n=90). It appears to be 
parallel to, -de, the obligatory suffix described by Hornéy for Mro Khimi and -te%, the nearly obligatory 
suffix that appears in Herr’s transcriptions of Lemi Chin verbs. Unlike Mro Khimi, -ta does not occur 
in dimorphemic disyllables in the Kanise Khumi word list. Rather, -ta is appended on sesquisyllabic, 
disyllabic, and trisyllabic verbs as in (5). It is unclear why -fa is less obligatory in the Kanise Khumi 
word list data than the Lemi and Mro Khimi word list data. 


(5) -ta postverbal marker 


a. ?-root-ta [Pod ded ta?4] ‘to mash (a tuber)’ (0353) 

b. t-root-ta [tod pil ta? 4] ‘to squeeze’ (parsed as f-root ta) (0354) 
c. p-root-ta [pot jo] tad] ‘to suck’ 

d. k-root-ta [ko1 s3a1 tad] ‘tremble’ (parsed as k-root ta) (2209) 

e. 23-root-ta [?31k31 ta?d] ‘to grow’ 

f. a-root-ta [?al d5J tad] ‘to think about’ (2072a) 

g. Noun—Verb-ta [tsov hel ta?4] ‘to sow rice’ 

h. NOUN-a-root-ta [tsov ?al tiv ta?4] ‘to thresh rice by stamping’ 

(parsed as NOUN a-root ta) (0290) 

i. 23-p-root-ta [231 pod thou ta?4] ‘to transplant rice seedlings’ 


(parsed as ?3-p-root ta (0280) 


3 Tone analysis 

The transcriptions in this article are phonetic. Most are from Bryant’s Excel spreadsheet database. In 
several words, I have revised the transcriptions based on auditory perception and acoustic evidence. I 
charted the tone patterns based on Bryant’s transcriptions and checked them against the sound files 
using auditory perception and Praat visualization. 


3.1 Postverbal -ta 
This marker appears to be parallel to postverbal markers found in Lemi Chin, Mro Khimi, and possibly 
Bangladesh Khumi. In Herr’s Lemi Chin word list, [-te3] is present on almost every verb. In her phonetic 
transcriptions, the form is always specified with the mid tone and often accompanied by a glottal stop. 
A similar morpheme /de/ follows every verb stem in Hornéy’s Mro-Khimi word list data. While Herr 
focuses on describing minor syllables, Hornéy attempts a thorough phonological analysis of the tone 
melodies found in verbs. She considers /de/ to be an obligatory clause-final particle. She proposes that 
it is underlyingly toneless, gaining its tone from the root through a process of melody association and 
rightward spreading. According to Peterson (2019a; 2018), Bangladesh Khumi has a post-verbal 
hearsay evidential clitic represented as =fe’. In his examples and sample text, =te° surfaces with low 
and high falling tones but not with the checked tones. It is unclear whether the Bangladesh evidential 
clitic is analogous with the postverbal morphemes in the other three languages. Yet it appears that there 
is a contrast between Lemi on the one hand, in which the tone is fixed, and Mro Khimi and Bangladesh 
Khumi on the other hand, in which the tone varies. 

Kanise Khumi seems to follow the same pattern as Lemi. The phonetic realization is consistent 
with Tone 3: mid pitch with either a level or slightly falling shape as well as glottalization. The tone of 
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-ta is not sensitive to the tone of the verb stem. An example of -ta following each tone is provided in 


(6). 


(6) a. Following Tone 1 [tuil te: 1 ta?4] ‘to water’ 
b. Following Tone 2 [tsov 2a] tiv ta?d] ‘to thresh rice by stamping’ 
c. Following Tone 3 [tsov te ta?4] ‘to sow rice’ 


Any variation observed in the tone of -fa is consistent with general observations of Tone 3 in 
monosyllabic nouns (Ikeda 2021). In monosyllables, Tone 3 varies between level and slightly falling. 
This variation is also shown in (7) where most often the mid pitch is level (7a), but a slightly falling 
pitch is also frequent (7b). In seven words (7c), the pitch shape varies freely between the two shapes, 
indicating that the difference is not significant. Furthermore, right edge contour effects have been 
widely observed in languages related to Kanise (Evans 2009; Ikeda 2021; Herr 2011; Hornéy 2012). 
Generally, Speaker 2 tends to show the right-most edge falling contour as in (7d). In such cases, voice 
quality becomes the most reliable cue. 


(7) a. -tal (n=50) 
b. -ta) (n=26) 
C: -tal varies with -ta‘ (n =7) 
d. -ta\ (n=7: Speaker 2; creaky voice) 


Moreover, it is precisely at the right edge where voice quality is most noticeable. The verb ‘to grow’ 
has Tone 3 in both the verb stem and -ta. When the speaker pronounced the entire item [?31 k31 ta?4] 
‘to grow’, the final syllable was more glottalized than the penultimate syllable. When he isolated the 
verb stem [?31 k31?], the final syllable [k31] was strongly glottalized. Then [ta?4] was repeated three 
times in isolation with strong glottalization. 

In a few other cases, the speaker parsed -ta separately from the verb stem. Figure 1 shows the pitch 
curve of [tuil te:1 ta?4] ‘to water’ in blue dots. The vertical red line cuts through the center of -ta. The 
pitch of -ta is mid, slightly falling. Glottalization is evident in the breakup of FO near the end of the 
syllable. Figure 2 presents three repetitions of -ta when the speaker parsed ‘to water’ into three parts: 
[tuil] ‘water’, the verb stem [te:1], and [ta?4]. This suggests that the speaker treats -fa as a functional 
morpheme. 
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Figure 1: [tui] te:1 ta?1] ‘to water’ 


6.785204 


15.86 styq 


16.029 sty, 


14.98 sto, 


221240 


Figure 2: -ta repeated three times in isolation 
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3.2 Valency affecting prefixes 


3.2.1 Valency increasing p- and t- 
What about the tones of preverbal affixes? Valency increasing prefixes have been observed across a 
variety of Chin languages and quite consistently within the Khumi cluster. The most common forms 
are bilabials m- and p-. These are attested in Lamkang, Mara, Daai, K’Cho, Mro Khimi, Lemi, 
Bangladesh Khumi, and Rengmitca (So-Hartmann 2013; Peterson 2019b; Peterson 2013). 

In Lemi, causative and transitive verbs can be derived by prefixing free verb roots that denote 
intransitive states and activities. According to So-Hartmann (2013), m- and b- are in free variation, and 
t- is another form with lower productivity. She also gives examples revealing that m- and t- have been 
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lexicalized and frozen with verb roots and noun roots. These are treated as toneless minor syllables by 
Herr (2011) as in /m.na3.te?/ ‘to kiss’. Herr treats all Lemi minor syllables as phonologically toneless 
and states that they surface with a default mid-tone. So-Hartmann (2013) presents similar causative- 
transitive prefixes for Mro Khimi, with both m- and t-. Hornéy (2012) analyzes all Mro Khimi verb 
prefixes as toneless minor syllables that undergo vowel insertion and polar tone assignment as in /m- 
so-de/ > [mdsadé]. Rengmitca also has a causative m- prefix (Peterson 2019b). 

Bangladesh Khumi has a semi-productive causative p- prefix and a less frequent, less transparent 
causative f- prefix which may be fossilized (Peterson 2013). Peterson (2013) suggests that t- may be an 
allomorph of p- that primarily occurs before bilabial and h initials, and So-Hartmann (2013) comments 
that this may be the case in Lemi and Mro as well. Peterson represents these valency increasing prefixes 
as toneless minor syllables in Bangladesh Khumi. Similar to Lemi and Mro Khimi, they are “more 
transitive” than their free verb root counterparts. Peterson gives examples such as tlang*/p ’tlang* ‘melt’ 
and mé“/t’mé* ‘twist’. Peterson also provides evidence of pairs with the highly frequent middle marker 
such as aca’/p’ca? ‘bounce’ and apew'/t pew! ‘explode. (See Section 3.2.2.) 

These same valency increasing prefixes are attested in Kanise Khumi. A similar pair was elicited 
for ‘to yell, speak loudly’, a-root [?a1 ?aul] and t-root [to/?aul]. In Kanise Khumi, p- never precedes a 
bilabial initial or /h/, but t- does. The pitch patterns for p- and f- are given in Table 7. Three surface 
pitch patterns are observed. The pitch of the prefix is predictably mid, regardless of the tone of the 
following syllable. There is no indication of polar tone. Unlike -ta, the valency increasing prefixes are 
never separated from the verb root. 


Table 7: Pitch patterns with prefixes p- and t- 


Pitch Pattern Examples # of items 
HM [pol ?o1] ‘to bake clay’ p-=6 
[toi me: 1] ‘to forget’ t-=3 
ME [pot k*en] ‘to fetter’ p-=1 
[tal ye] ‘to war, to wage war’ t-=3 
[pot ni] ‘to kiss’ p-=8 
MM [tod p'rad] ‘to erase, delete’ t-=15 


The prefix syllable tends to be of noticeably short duration (roughly 50-90 ms), so pitch is often 
significantly perturbed by surrounding consonants. The differences in pitch height and pitch shape 
across words are similar to the differences across tokens in the same word (See Figure 3). Therefore, it 
is difficult to say that these differences merit analysis as different tone patterns. 


udly’ 


Figure 3: Three tokens of [ta\?au]j ‘to yell, speak lo 
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: \] 
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115.86 styoo 
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Total duration 12.217604 seconds 


Voice quality cues on the major syllable (verb root) were consistent with the bundle of features observed 
in monosyllabic nouns. The high-level pitch is correlated with modal voice (Tone 1). The pitch fall 
from mid to low is correlated with breathy voice (Tone 2). The mid tone is shorter and glottalized and 
often creaky with either level or slightly falling pitch shape (Tone 3). Speaker 2 continued to exhibit a 
right edge boundary effect on Tone 3 with voice quality the most reliable cue. Yet with the prefix 
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syllable, it is difficult to perceive voice quality cues. Perhaps this is due to the short duration. When the 
prefix precedes an initial glottal stop, it sounds quite glottalized. 

Interestingly, the speaker does not associate the vowel in the presyllable with the same 
orthographic representation as a main syllable mid-central vowel. He always writes the presyllable 
vowel as <a> whereas in final syllables, a mid-central vowel is written as <ay>. Perhaps these prefixes 
are fossilized and now form “reduced” syllables. Although the presyllables are short and predictably 
have mid pitch, we cannot conclude that they are lexically specified with Tone 3. Like Lemi Chin, 
minor syllables may be toneless and simply realized with a “neutral” pitch. 


3.2.2 Valency-decreasing a(y)- 
The pitch patterns associated with the full syllable prefix a(y)- are quite different in that the prefix has 
predictably high pitch, much like Tone 1. 

Peterson (2013) describes Bangladesh Khumi as having a highly frequent, toneless middle marker 
a- with reciprocal, passive, and anticausative senses. There are verb pairs where the verb prefixed with 
a- is less transitive than the verb root as in abang*/bang? ‘hang’. In addition, verb pairs with a- vs. p-/t- 
can be compared in terms of valency as in acda’/p’cd? ‘bounce’ and apew’/t’pew’ ‘explode’. Finally, 
Bangladesh Khumi has optional nasals associated with a- as in a(m)lé/p’lé! ‘roll’, a(ng)hd?/p ha? 
‘slide’ and a(ng)hiing’/t’hiing? ‘shake’. While Peterson treats this marker as a full syllable, he describes 
it as atonal. 

Equivalent valency-decreasing prefixes have been observed in Lemi and Mro Khimi (So-Hartmann 
2013). Lemi intransitive, reflexive, and reciprocal verbs can be derived from transitive verbs with the 
prefix ae-. The Lemi transitive verbs can simply be verb roots as in aehi ‘y is spread’/hi ‘x spreads y’. 
However, the transitive verbs may also have the m-/t- prefixes to which ae- is added as in aemdicaw ‘x 
feeds himself?/ mdcaw ‘y feeds x’ and aetdimiing ‘x+y suppress each other’/tdmiing ‘x suppresses y’. 
With some verbs expressing body posture, the ae- intransitives are lexicalized and there is no more 
transitive counterpart. 

So-Hartmann (2013) gives examples of a valency-decreasing ka- in Mro Khimi, such as kamshie 
“x washes himself?/mshie ‘y washes x’ and kabraan ‘x+y fight with each other’/braan ‘x quarrels with 
y’. However, she states that the ka- for deriving reflexives is not really productive and that reciprocal 
verb derivations are rare. 

Again, we find a dichotomy between Lemi and Mro Khimi, where Lemi full syllable prefixes seem 
to carry tone and Mro Khimi prefixes display polar tone assignment. Hornéy (2012) seems to have 
included ka- in her analysis of polar tone prefixes, for she has [kat*gde] ‘boast’ and [kas*ide] ‘wear’ as 
examples of pitch patterns with low and high pitch respectively. Table 8 provides a few comparisons, 
illustrating that Herr analyzed Lemi’s ae- as a full syllable, surfacing with a mid glottalized tone. 


Table 8: Comparitive data for Lemi (Herr 2011), Bangladesh Khumi (Peterson 2013) and Kanise 


Khumi 
Lemi Bangladesh Kanise 
Khumi Khumi 
a. ?¢3 10° te? a(m)lé ! ?an] led ‘roll’ 
b. ?e?.po>.te? apung? ‘marry’ 
Ci ?e3.pre3.te? a(ng)pre? ?4m1pre1 = ‘divorce’ 


Again, Kanise Khumi aligns with Lemi Chin. In 79 out of 82 items, the prefix is pronounced with high 
pitch. Voice quality sounds modal, except when preceding a verb root that begins with a glottal plosive. 
There are three exceptions where a- appears with a mid pitch. One of those cases is the verb ‘to sing’ 
[?a1 20e1] #2150; however, when the same verb appears at another point in the word list as ‘to sing (of 
birds)’ [?al ?0e1] #784, the prefix had high pitch. The three exceptions may indicate that the pitch of 
the prefix is non-contrastive, even if tone is lexically specified. 
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Table 9: Pitch patterns with prefixes a- and ay- 


Pitch Pattern Examples # of items 
HH [?a1 bil] ‘to hide (oneself)’ a-=12 
[?4m1 prél] ‘to run away, to flee’ | ay-=14 
HE [?al gev] ‘to wait’ a- = 10 
[?am 1 p3\] ‘to cross over, traverse’ | an- =6 
uM [?a1 b#] ‘to wear in your ears’ a- =25 
[?a4m 1 pred] ‘to divorce’ ay- = 12 
[?a1 ?oe1] ‘to sing’ aiel® 
MH [?a1 ma:1] ‘to dream’ 7 


[2and ré]] ‘rich’ ae 


The speaker seems to associate the prefix with Tone 1. Generally, the speaker parses ‘words’ so that 
the prefix is bound to the verb root. However, it is possible for him to isolate the valency-decreasing 
prefix. In two items, the speaker isolated the prefix from the verb root. For example, #712 ‘to kick (of 
a horse)’ [?al pil] was repeated twice and then [?a:1] and [pit] were each repeated three times. When 
the prefix was pronounced in isolation, the duration was long, the pitch was high, and the voice quality 
was modal. These are the features of Tone 1. Similarly, the form of the prefix with the nasal [?4:] was 
repeated three times in isolation with the verb [?an1 3au4] ‘to carry together (two people carry objects 
tied to a pole that rests their shoulders)’ #2431. Again, the phonetic cues were like Tone 1, long duration, 
high pitch, and modal voice quality. Further evidence for lexical specification with Tone 1 comes from 
trisyllabic forms. Like Lemi, Kanise Khumi has forms where the valency-decreasing prefix is added to 
transitive verbs that begin with p- and t-. The speaker parsed [2a] pod te] ‘to tease’ #2293 as [?al] and 
[pod ted], with [?al] carrying Tone 1. 

As Peterson (2013) noted, it is difficult to explain the optional nasal. It is unclear why it appears 
on some verbs and not others. However, the phonotactic restrictions on coda consonants are consistent. 
In final position, the nasal plosive is deleted and pronounced as a nasalized vowel. In non-final position, 
the place of articulation of the nasal assimilates to the place of articulation of the following consonant. 
Before an initial glottal plosive, the nasal is velar, a pattern that has also been observed for presyllables 
in Katuic languages of the Austroasiatic phylum (Gehrmann 2017). Bangladesh Khumi, on the other 
hand, appears to have both am- and ang- forms that are not sensitive to the place of articulation of the 
following syllable. 


3.3 Adjectivizers k- and ka(y)- 

The adjectivizers k- and ka(y)- provide another comparison of tone patterns in prefixes. Synchronically, 
Kanise Khumi appears to have multiple forms with similar functions. Several adjectives are formed 
with either k- or ka(y)-. They can modify a noun as in (8). The nasal is often, but not always, homorganic 
with the onset that follows. Sometimes ka- is simply nasalized and there are even a couple of cases 
where ka- is not nasalized. 


(8) a. [tui] kand nu] ‘turbid water’ 
b. [tuil kad tsail pi:1] ‘clear water’ 


In unpublished notes, entitled “Lemi Functional Prefixes”, So-Hartmann lists three prefixes as having 
adjectivizer functions: kd-, kae-, kang-. The first two also derive nominals in Lemi. In addition, Mro 
has a prefix ka- that derives agentive nominals from activity verbs (So-Hartmann unpublished; So- 
Hartmann 2008). 

Whether the adjectivizer is a minor syllable or a full syllable, the pitch is almost always mid as 
shown in Tables 10 and 11. There are only two exceptions. The tone of the prefix does not appear to be 
sensitive to the tone of the following syllable. 
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Table 10: Pitch patterns with k- 


Pitch Pattern Examples # of items 
MH [kod t'31] ‘small’ 4 
MF [kal jon] ‘ticklish, sensitive’ 2 
MM [kod led] ‘big’ 2 
HF [kol ?a] ‘shine (the sun shines)’ 1 


In many ways, k- behaves phonetically like p- and t-. The prefix has very short duration and the pitch 
is almost invariably mid. Glottalization is noticeable when the following onset is a glottal plosive. There 
is no way to determine whether the prefix is toneless with neutral pitch or whether the prefix carries 
Tone 3 with the voice quality feature neutralized. On the other hand, the full syllable ka(y)- appears to 
be specified as Tone 3. In four cases, the speaker isolated the prefix and pronounced it very clearly with 
the features of Tone 3: short duration, mid pitch, glottalization. 


Table 10: Pitch patterns with ka(y)- 


Pitch Pattern Examples # of items 
MH [kand sé1] ‘yellow’ 5 
MM [kai kay] ‘empty’ 4 
FH [ka hé1] ‘raw (not cooked)’ 1 
MF [ka(y)1 10] ‘white’ 5 


3.4. 3-person participant marker ?3- 

With the previous sections in mind, it would be tempting to argue that Kanise Khumi patterns like Lemi. 
In other words, minor syllables are toneless and major syllables are tone-bearing. In contrast to Mro 
Khimi, we do not find the evidence to support the hypothesis that all affixes are deriving their surface 
tone in relation to the verb root. Yet the preverbal 3-person participant marker ?3- creates a problem 
for an analysis based on syllable structure. 

In some Tibeto-Burman languages, verbs can be marked for pronominal agreement (Matisoff 
2003; Bauman 1974). Matisoff remarks that a language may often use a verb prefix showing agreement 
with a 3"-person subject that corresponds with the possessive function of the same prefix on nouns. He 
gives Lai Chin as an example. Kanise Khumi has a parallel set where the 3''-person agreement verbal 
agreement marker ?3- (9a) resembles the minor syllable prefix ?a- on nouns with inalienable possession 
(such as parts of the body, parts of a plant, bodily fluids). The relationship between the verbal marker 
?3- and the 3"-person independent pronouns is highly transparent as illustrated in Table 11. 


(9) a. [231 ded] ‘to die’ 
b. [2a4 thai?] ‘fruit’ 


Table 11: Independent pronominal elements in Kanise Khumi 


Singular Dual Plural 
Inclusive Exclusive Inclusive 
1st kai [kail] ai hni kai hni [kail nid] ai ci 
2nd nang [nal] nang hni [nan] nid] nang ci [nan tsiv] 
3rd _| y ni [?3(1/1) ni?4] (y) ni hni [nin ni1] (y) hni ci [nil tsin] 


While many Chin languages have robust pronominal agreement, such agreement is weaker in Khomic 
languages. So-Hartmann wrote that there is no pre-verbal participant reference in Lemi and only object 
agreement marking in Mro. Peterson (Peterson 2002) explains that participant reference marking can 
be elicited in Bangladesh Khumi, but it is not frequent in his corpus of texts. It is likeliest to be found 
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as encoding speech act participants in reported speech in narratives, in perspective shifting, and in 
speech act participants in conversation. He specifically states that he cannot speak for varieties in 
Myanmar. In the Kanise word list elicitation, ?3- was the most frequent preverbal element by far, though 
not obligatory. 

The speaker parsed ?3- as a separate element a few times during the word list elicitation, but rarely. 
The pitch patterns are displayed in Table 11. There was no indication of polar tone. Overwhelmingly, 
the peak of the first syllable is higher than the peak of the second syllable. There was more variation 
within the HM pattern than in any other. Voice quality of the first syllable tone sounds glottalized when 
preceding a glottal plosive onset. 


Table 11: Pitch patterns with ?3- 


Pitch Pattern Examples # of items 
HH [?31 tol] ‘sour’ 18 
HF [?31 sov] ‘spicy, hot (of peppers)’ 29 
HM [[231 t'od] ‘to whittle, cut out, to shape, to sculpt’ 45 
i eM [?31 tsaiN] ‘clean’ 16 
MM [?91 ke] ‘to gnaw’ 1 


Bryant generally transcribed the marker as 73-, but sometimes he used ?a-. Thus far, acoustic analysis 
has not indicated a distinction between the two (Wen this volume). In terms of duration and vowel 
quality, the marker appears remarkably like the minor syllables occurring with p-, t-, k-, and s-. Yet 
pitch seems to be the feature that sets 73- off from the others. Figure 4 illustrates this difference. The 
speaker gave two alternatives for ‘true, real, factual’: first [?31 tev] and then [kad tev]. It is not clear how 
these differ semantically, but the crucial point is that the figure displays the difference in pitch between 
?3- and k-. The first two repetitions are [?31 tév] and the second two are [ko1 te]. 


Figure 4: Praat visualizations of [P31 te\] and [ka1 te\] ‘true, real, factual’ 
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The tendency for ?3- to be realized with a high pitch begs the question of whether it bears Tone 1, the 
high-level modal tone. Indeed, the phonetic cues for Tone | were evident when the speaker parsed 73- 
separately from the rest of the verb in trisyllabic verbs. In (10), the speaker first pronounced the verb 
‘to chop down, to knock down’ as [231 pad tol]. He then parsed the verb into two parts: [?31] and [po4 
tol]. Next, he gave the verb in context with an object complement ‘tree’ [té:1 kt:1 ?31 pod tol]. Finally, 
he parsed the object separately from the verb: [t'é:1 kt: 1] was repeated three times and then [?31 po4 
tol] was repeated three times. 


(10) t8é:1kt:1 = ?31 pod tol 
‘tree’ ‘chop down’ 


In this way, the speaker made units of analysis visible. 73- is a unit that can be pronounced in isolation. 
When it is spoken in isolation, it carries the phonetic cues of Tone 1. The duration is consistent with 
Tone | monosyllables; the voice quality is modal, and the pitch is high level. Thus, 73- differs from the 
other non-final syllables with a short, mid-central vowel, namely p-, t-, k-, s-, in that itis separable from 
the verb stem and in that it seems to be specified for tone at some level. A psychological distinction is 
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also evident in the orthography. In word-final syllables, a mid-central vowel is represented as <ay> and 
a mid-central vowel in non-final “minor syllables” p-, t-, k-, s- is represented as <a>. On the other hand, 
the 3™-person agreement verbal agreement marker is represented as <y>. 


4 Conclusion 

The analysis of tone patterns in Kanise Khumi citation verbs differ from the analysis of Mro Khimi and 
Lemi Chin. In Mro Khimi, there were varied tone melodies with each morpheme, such as postverbal - 
de and preverbal elements like ma- and ka-. This variation motivated Hornéy’s analysis that affixes are 
underlyingly toneless, with tone being specified through post-lexical processes like melody association 
and polar tone assignment. While there are a few exceptions in each category, the Kanise Khumi 
morphemes examined in this paper tend to surface with predictable pitch, not variable pitch. The pitch 
of an affix does not appear to be sensitive to the pitch of the verb root. Two morphemes, -ta and ?3-, do 
involve a greater proportion of variation than the other morphemes discussed here. It is important to 
note, however, that with these two morphemes, the variation in pitch across words is matched with a 
high amount of variation across tokens of the same word. 

It is also difficult to characterize Kanise Khumi based on syllable structure. In Lemi Chin, full 
syllables bear tone, whereas minor syllables are phonologically toneless, surfacing with a neutral mid 
pitch. Minor syllables in Lemi Chin also reflect other constraints that are widespread in Southeast Asian 
languages. The duration is noticeably short in comparison to full syllables, producing an iambic rhythm. 
Onset consonants in these syllables are restricted in number and complexity of phonemic features. 
Phonetic vowel quality is loosely restricted to a range of mid-central vowel qualities. These same 
features are evident in several of the Kanise Khumi prefixes, notably the valency-increasing prefixes p- 
and t-, and the adjectivizer k-. With predictable mid pitch, these prefixes support an analysis based on 
syllable structure in which full syllables are tone-bearing and minor syllables are toneless. Prefixes a(y)- 
and ka(y) would fit with this analysis in that the vowel is not “reduced” and they allow coda consonants. 
Furthermore, they do not share the same pitch patterns as p-, ¢, and k-, even though they have similar 
valency affecting and adjectivizing functions. Valency-decreasing a(y)- could be analyzed as a full 
syllable that bears Tone 1, the high-level modal tone. The adjectivizer ka(y) could be interpreted as a 
full syllable that bears Tone 2, the short mid glottalized tone. Such an analysis would be supported by 
the rare occasions when the speaker parsed a(y)- and ka(y)- separately from the verb stem. In these 
cases, a(y)- carried Tone | and ka(y)- carried Tone 2. 

Nevertheless, the pitch patterns involving the 3"-person agreement marker ?3- pose a problem for 
such an analysis based on syllable structure. This marker shares many of the same features as a minor 
syllable: (1) non-final, (2) short duration, (3) mid-central vowel quality, (4) no coda consonant. Yet, the 
pitch of ?3- cannot be predicted in the same way that it is for the other minor syllable preverbal elements. 
The pitch of the other minor syllables p-, t-, k-, s- is usually mid. While the pitch of ?3- is mostly 
predictable, it is generally high, not mid. Unlike the other minor syllables, 73- is separable from the verb 
stem. When separated, it clearly bears Tone 1, a high-level modal tone. Table 12 highlights the 
transitional status of 73-. In some ways, 73- patterns like the minor syllable prefixes p-, t-, k-, s-. In other 
ways, 73- patterns like the full syllables a(y)- and ka(n)-. 
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Table 12: Summary of non-final syllable features 


p-, t- k- S- ?3- a(y)- ka(y)- 

semantic function v v ? v v v 
iambic rhythm v v v v v v 
restricted onset /p,t,k,?,s/ v v v v v v 
coda permitted x x x x v v 
vowel quality MCV MCV MCV MCV a a 
separable x x x v v v 

orthographic vowel <a> <a> <a> <y> <a> <a> 
predictable pitch M M M H H M 

voice quality (in isolation) x x x modal modal glottalized 


(Tone 1) (Tone 1) (Tone 3) 
(* = unattested) 


The strong preference for iambic rhythm in Kanise Khumi words means that non-final syllables almost 
always get “reduced” in lexical derivation. Lower amplitude, intensity, and shorter duration is usually 
evident even with compounds in Kanise Khumi. So-Hartmann and Peterson commented on the 
productivity of the minor syllable prefixes in other Khomic languages, arguing that there was evidence 
that they may be fossilized. The ability to separate ?3-, a(y)-, and ka(y)- from the base may be an 
indication that these are still productive morphemes in Kanise Khumi. It may also be evidence that 73- 
relates to the verb stem differently than p-, t-, k-, s-. This conclusion is supported by the difference in 
pitch realizations where ?3- is generally high and the others are usually mid. 

It does not seem likely that all affixes in Kanise Khumi are toneless. Preverbal a(y)-, ka(n)-, ?3- as 
well as postverbal -ta appear to be specified for tone, at least phonetically. Similarly, tone patterns on 
Kanise Khumi verbs cannot be explained by syllable structure alone, due to the fact that 73- patterns 
like a(y)- and ka(y)- rather than p-, t-, k-, s-. Instead, tone patterns on Kanise Khumi affixes need to be 
analyzed with reference to each affix. Furthermore, the morphological structure cannot be neglected. 
West (2014) proposed a typology of minor syllables based on gradient realizations of pitch: (1) non- 
TBU, optionally epenthesized; (2) morpheme-internal, lexically specified TBU; (3) stem-external, 
lexically specified TBU; (4) reduced, not phonemically distinct; (5) reduced, phonemically distinct. It 
is possible that 73- patterns differently because it is a stem-external minor syllable, whereas p-, t-, k-, s- 
may no longer be functional morphemes. Perhaps they have become fused with verb root through a 
process of lexicalization. A great deal more data would need to be analyzed to investigate this 
hypothesis. At this point, it is enough to say that we must take each morpheme on its own terms when 
it comes to tone analysis. 
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Abstract 

This paper investigates the correlation of form and function in Burmese noun phrases. 
Burmese has several ways of combining nouns with verbal and other modifiers, which 
exhibit different degrees of syntactic boundness and semantic integration. The claim is that 
syntactic and semantic boundness are predictive in that syntactically more tightly bound 
expressions coincide with closer semantic integration. The results partly confirm this claim 
though there are other factors determining the choice of a specific construction in a given 
context in Burmese. The study, which concentrates on verbal modifiers, is based on a wide 
range of sources, including grammatical descriptions, lexica, as well as formal and 
colloquial texts. Additional examples have been constructed and checked with native 
speakers. 


Keywords: Burmese, Noun phrase, word order, semantics and syntax 
ISO 639-3 codes: mya 


1 Introduction 

In the present study, we investigate different types of noun-modifier collocations in Burmese. These 
constructions exhibit clearly different degrees of syntactic boundness, and they are in most cases not, 
or not completely, coextensive in their functional ranges. That is, in many cases, the speaker has to 
choose one among several available constructions, the choices often being not interchangeable. Of 
special interest are the NPs containing a noun and a verbal modifier, which exhibit the widest range of 
formal possibilities, each with its own inherent semantics. Although the expressions are not generally 
mutually exchangeable, the syntax-semantics correlation is not always obvious and requires an in-depth 
analysis. One hypothesis that is looked into in this study is the idea that formal (morphosyntactic) 
boundness iconically corresponds to functional (semantic) boundness (e.g., papers in Haiman 1985). 
There is evidence that this claim is at least partly true in Burmese, though it is not the only factor 
determining the choice of a specific construction over another one in a given context. 

The noun attributes investigated in this paper are preposed and postposed verbal and nominal 
modifiers of the types mjin-p’ju (mjin-bju)* ‘horse-be.white’,? p*ju-03-mjin ‘be.white-ATTR-horse’, and 
mjin-rap'ju ‘horse-NML.be.white’, all translated as ‘a/the white horse’. These expressions are treated as 
synonyms by Pe Maung Tin (1963), and are generally not analyzed in indigenous grammars, such as 
the latest edition of the Myanmar Thadda (Myanmar Language Commission 2005). Obviously for the 
Burmese, they pose no problem and do not appear to deserve room for elaboration in prescriptive 
grammars. 


' This paper was planned to be written in collaboration with Rudolf Yanson who unexpectedly passed away in 


May 2021. It is a sad honor for me to finish this piece of work on which we had been working together for 
over a year. The draft was well advanced when we last were in contact in late April, and I made every effort 
to finish it to reflect Rudolf’s perspective as much as my own. 

Voicing occurs in close juncture after plain and nasalized vowels. It is regular in grammatical morphemes, but 
less so in lexical items. Where voicing is consistent (and in some cases phonemic), it is indicated in this paper. 
In cases where there is variation, either individual or regional, the unvoiced form is given. 

In glossing Burmese verbs, we use the copula “be. V’ for stative verbs (corresponding to adjectives in English) 
and ‘to.V’ for active verbs. 
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2 Types of noun modifiers 

In Burmese, there are several types of constructions linking nouns with different types of attributes, 
some with clear morpho-syntactic structure and transparent semantic composition. Other constructions 
are more challenging, both in terms of syntax and semantics. In particular, the latter deserve a closer 
investigation. 


2.1 Nominal modifiers 

The first type is represented by constructions containing two nouns [N1N2], where N2 unambiguously 
functions as head and N1 as an attribute. This pattern is consistent with the general head-final structure 
of Burmese. Examples of this type are we?-@a ‘pork’ (lit. ‘pig-flesh’) and cé-jou? ‘statue’ (‘brass-figure’ ) 
(JJIenny & Hnin Tun 2016:112ff). [N1N2] constructions are different from superficially similar 
expressions where N1 is a possessive modifier. In this case, N1 is marked as possessive (or dependent) 
by the “induced creaky tone” (Okell 1969) where possible. This morphological distinction is neutralized 
if N1 end in a glottal stop or creaky tone. Compare Ju-mjo ‘ethnicity’ (‘man*-kind’) with lui-bawa 
“human life’ (‘man.DEP-life’). The former may be termed a NOMINAL COMPOUND, the latter a NOUN 
PHRASE. Another type of [N1N2] contains two nouns in appositional relationship. That is, they both are 
syntactically speaking heads, both referring to the same entity, one being more general, the other more 
specific. This type of construction is seen in kinship and social terms with proper names, as in s’aja 
win-Paun ‘teacher Win Aung’. A further type of [N1N2] construction is seen in coordinate compounds, 
where none of the nouns function as modifier. An example of this type of construction is ?ap'e-Pame 
‘parents’ (‘father-mother’). This last type is clearly beyond the scope of this paper. 


2.2 Verbal modifiers 

Verbal modifiers can be combined with nouns in four different constructions. In the following 
subsections, the four types of N-V collocations will be presented with their formal and functional 
properties. 


2.2.1 Marked attributives 

The most frequent and flexible of the modifying constructions is the one consisting of a verb attached 
to the following noun by a special syntactic marker 93/05 in formal or té/dé in colloquial language. 
These two markers will be glossed as ATTR in all examples, and the construction is given here as 
[VATTRN]. Relevant examples are ci-dé/05 Pein ‘a/the big house’ (‘be.big-ATTR house’) and ni-dé/d9 
ka ‘a/the red car’ (‘be.red-ATTR car’). 

The syntactic structure of [VATTRN] is identical to more general attributive (relative) clauses 
modifying a nominal head, with the clause consisting minimally of a verbal predicate. In the same 
fashion, a more complex clause can be attached to a noun, including full arguments, TAM, polarity, 
and other categories. Attributive clauses in the pattern [CLAUSEATTRN] also allow tense modification 
of the clause, with the attributivizer appearing either as non-future té/dé (literary Burmese 03/05 or 61/01) 
or future mé (literary Burmese mji).” The examples above could therefore be translated as ‘a/the house 
that is big’ and ‘a/the car that is red’, respectively. In this paper, we will discuss only unmarked 
attributive verbs of this pattern, without overt arguments and TAM/polarity marking. 


2.2.2 Nominalized verb as modifier 

Another type of construction, the most challenging one in terms of form and function, consists of a noun 
and a postnominal verb in nominalized form, [VaN] (see also Nosova 1974). The verbal noun is formed 
by adding the prefix ?a- to any verb, the most common and flexible means to form nouns from verbs. 


The gloss ‘man’ for Ju is to be understood in the sense ‘human being’, irrespective of gender. 

Speakers of Burmese employ two different styles of the language, viz. literary and colloquial, distinguished 
mainly by their grammatical forms (Okell 1994). The former is used in formal contexts, the latter in informal 
spoken settings. While literary Burmese used to be the only language of writing, colloquial Burmese appears 
increasingly in modern prose, sometimes in a mixed style. 
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This type of construction is seen in expressions like Pein Paci ‘a/the big house’ (‘house-NML.be.big’) 
and ka Pani ‘a/the red car’ (‘car-NML.be.red’). 

The nominalizing prefix ?a- is very old in Burmese, going back to proto-Tibeto-Burman origins, 
where a nominalizing prefix *?a- can be reconstructed with nominalizing and other functions (Matisoff 
2003:104ff).° It has a general nominalizing function, with no specification as to whether the 
nominalized form is a verbal abstract, an instrument, an actor, object, or any other function with a 
logical relation to the verbal semantics. The following examples, mostly taken from Okell and Allott 
(2001), illustrate the different functions of [aV] nominalizations. 


Base verb Gloss Derived noun gloss 

louP ‘to do’ Palour ‘work’ 

Win “to enter’ PaWwin ‘entry’ 

t'tt-s'aNn “be unusual’ Pat'u-Pas'an “something unusual’ 

Gi ‘to know’ Pat ‘knowledge, acquaintance’ 
i to.bear.fruit Pai ‘fruit’ 

kaun “be good’ Pakaun “goodness, a good one’ 
s'o ‘to sing, say’ —- Pas"o-do ‘singer’! 


The questions arising about the [NaV] constructions are both syntactic and semantic. It is not at first 
sight obvious what the syntactic make-up of [NaV] expressions is, that is, which part (if any) is to be 
seen as head, which as modifier, and how the two parts relate semantically. These points will be 
addressed in more detail in section 4. 


2.2.3 Bare verb as modifier 

While [NaV] are frequent and productive without any restrictions other than those caused by semantic 
clashes, they seem to be in competition with what looks like reduced forms [NV], that is, a noun 
modified by a bare verb without the nominalizing prefix. Typical examples of [NV] constructions 
include lexicalized expressions with specific meaning, such as ?ein-bju-do ‘the White House’ (“house- 
white-HONORIFIC’) and ¢a?-ni ‘Red Shirts’, ¢a?-wa ‘Yellow Shirts’ (former political movements in 
Thailand), literally ‘shirt-be.red’ and ‘shirt-be.yellow’, respectively. Apart from clearly lexicalized 
expressions with partly idiosyncratic meaning not fully derivable from the parts, there are cases where 
[NV] appears interchangeably with [NaV], as in kd-haun ‘used car’ and ka-6i? ‘new car’ besides ka- 
Pahaun and ka-PaGi?, with the same meanings. As with [NaV], the internal syntactic structure and 
semantic composition of these constructions is not obvious and deserves more detailed investigation. 
This should lead to a better understanding of the factors favoring one or another construction in a given 
context. 

Superficially, there are constructions containing active verbs of the [NV] type where V is not a 
modifier of N, but these [NV] expressions are (usually non-headed) lexicalized compounds, and the 
relation ‘noun-modifier’ is not traceable within such constructions. Relevant examples are t/’amin-je? ‘a 
cook’ (from t’amin c'e? ‘to cook rice’), jin-maun ‘driver’ (from jin maun ‘to drive a vehicle’) and zabwé- 
do ‘waiter’ (from zabwé t’o ‘to strike tables’). In other, less frequent [NV] compounds, the V functions 
as a semantic head, as in lu-bj5-Qu-bj> ‘hearsay, rumor’ (from lu pjd Ou pjd ‘man to.speak person 
to.speak’) and Ju-zui ‘group, crowd’ (from Ju si ‘man to.gather’).8 N in these examples denotes the 
patient (object) or agent (subject) of the verb, unlike the NV compounds under consideration here, 
where N is the semantic head and always denotes the subject of a stative V. One group of [NV] 


6 The label “nominalizing” is not entirely adequate, as the prefix is also added to already nominal bases, as in 


kinship terms (Pape ‘father’, Pada ‘aunt’) and classifiers (Pakaun ‘body; CLF for animals’, apin ‘trunk,tree; 
CLF for plants), among others. The gloss NML for ‘nominal’ is used here in the examples, not specifying the 
exact semantics of the nominalization. 

?as'o-do is lexicalized with the HONORIFIC/ROYAL suffix -t9/do. 

In all these lexicalized items, the second part is always voiced; there is a distinction between voiced and non- 
voiced V in all these cases, the latter being a V with O combination (phrase), the former an NP. 
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constructions that falls somewhere between the two is exemplified by expressions like je-de? ‘rising 
tide’(from je te? ‘water to.rise’) and ne-bu ‘heat of the sun’ (from ne pu ‘sun be.hot’). In the latter 
expressions, N is the subject of V, which may or may not be stative, and V is not necessarily a modifier 
to N. All of these constructions are beyond the scope of this paper and will not be discussed further 
here. 


2.2.4 Reduplicated modifier 
The last type of N-V construction, consisting of a noun with an adjoined reduplicated verb, is 
semantically different from the previous ones, as it adds an emotional or subjective connotation to the 
modifier. This value can be augmentative (‘very X’) or diminutive (‘rather/quite X’). In both senses, 
the reduplicated attributive is frequently introduced by the particle ka?-. The formula used in this paper 
is [NVV], reflecting the more common order of elements. Examples are ?ein ci-ji ‘a rather/very big 
house’, lit. ‘house be.big-be.big’, and ka ni-ni ‘a reddish car, a quite/very red car’, lit. ‘car be.red- 
be.red’.’ In both cases, k’a?- ‘rather’ may be added, yielding Pein k'a?-ciyi and ka k'a?-ni-ni, 
respectively. 

The order of elements is generally a noun followed by the reduplicated verbal modifier, with or 
without k’a?, though Okell (1969) also gives examples with the opposite order for [NVV] and [NaV] 
constructions, as seen in the following examples.'° Okell (1969:81) states that: 


both aV attributes (when not tightly linked) and those derived by other formatives are sometimes found 
in the reverse order, i.e., with the attribute before the head, e.g., 


pja-bja Péingi as well as Peingi pja-bja, 
k'a-pja-bja Péinji as well as Peingi k'a?-pja-bja, 
Papja Peinzi as well as Peéingi Papja'! 


The reduplicated attributives, when preposed as suggested by Okell (1969), can be ambiguous as to 
their function as noun attributes or adverbs. The expression kaun-gaun ?ein-hma nei-de is most likely 
interpreted as ‘they stay home well’, rather than ‘they live in a good house’. The latter is more likely to 
be expressed as ?ein kaun-gaun-hma ne-de, the former unambiguously as Pein-hma kaun-gaun ne-de. 
This type of construction, which is semantically clearly different from the other three, is not further 
investigated in this paper and left here for further analysis. In this study, we will concentrate on the 
constructions of the types [VATTRN], [NAV], and [NV]. 


3 Syntactic and semantic comparison of noun-modifier expressions 


3.1 Different form - different function? 
The four types of verbal modifiers with nouns can be summarized as follows: 


[VATTRN] ci-dei Pein ‘a/the house that is big’ 
[NaV] PeiNn- Paci ‘a/the big house’ 

[NV] Pein-ji ‘a/the big house’ 

[INVV] Pein-cil-pi ‘a/the rather/very big house’ 


As can be seen from the translations given above of the different constructions, they seem to be 
semantically identical and in relevant publications are usually treated as synonymous. Thus, Okell 
(1969:68ff) translates the expressions pu-dé-je [VATTRN], je-bu [NV], and je-?apu [NaV] all invariably 


The voiceless initial of the first part of the reduplicated verb suggests that there is no close juncture here, unlike 
in [NV] compounds seen in 2.2.3. 

Expressions of the type [VVN] and [aVN] are not accepted as grammatical by all speakers and are hardly (if 
at all) found in spontaneous conversation. 

The transcription and glosses of examples from all sources have been unified. 


165 


Papers from SEALS 30 — Jenny 


as ‘hot water’. Hopple (2011:71f) also does not differentiate between the constructions [NV] and [NaV], 
as is evident from her translation of both Pein-bju and Pein-?ap"ju as “white house’. 

In a recently published descriptive grammar of Burmese (Jenny & Hnin Tun 2016) the three 
constructions are translated as follows: ?ein-haun [NV] ‘old home, place of origin’ (lit. ‘house-be.old’), 
Pein-rahaun [NaV] ‘a/the old house’, and haun-dé Pein [VATTRN] ‘a/the old house’ (Ibid.:152). The 
difference between [NV] and [NaV] is given as “unlike bare verbs following nouns, the nominalized 
forms are not restricted to fixed expressions and are less lexicalized” (Jenny & Hnin Tun 2016:152). 
This suggests that [NV] expressions are only found in conventionalized or lexicalized compounds, not 
as a productive syntactic process. The data given below will show this to be too narrow an explanation, 
as there is some overlap between [NV] and [NaV], the former being formed spontaneously and 
productively. Most published grammars and descriptions of Burmese are silent on the question whether 
[NV], [NaV] or [VATTRN] is used in which context, presenting them as homonymous without further 
explanation. The impossibility of substituting one for the other in most cases speaks differently, though. 
In a publication about NPs in Burmese, Tagunova (1971) writes with respect to the difference between 
[VATTRN] and [NV] that 


prepositional attributes (i.e., [VATTRN] constructions, R.Y.) express an individual quality related to a 
concrete surrounding and are used to underline a quality or peculiarity of the concrete person or subject. 
Postpositional attributes (i.e., [NV] constructions. R.Y.) are used to describe qualities or peculiarities 
of the whole class of the persons or subjects. They indicate a permanent quality, and are of descriptive, 
not determining nature. (Tagunova 1971:49; translation R.Y.) 


It is worth mentioning that the author ignores constructions of the [NaV] type all together, which, 
besides the vagueness of the definitions given above, does not contribute much to solving the problem 
of the interrelation between the three constructions under discussion here. Regarding the semantics of 
the [VATTRN] constructions, the author mainly bases her conclusion on the fact, that such attributes can 
have their own grammatical markers (see above, 2.2.1), which naturally makes them more concrete and 
contextually adaptive than the other constructions, and as such they may sometimes correspond to 
situations where an English definite article is used. 

Despite the apparent synonymity of the three constructions, they almost never allow substitution 
of one by another in a concrete speech situation. The main task of this study is to find out the semantic 
differences between the constructions and the reasons why native speakers use this or that construction 
in a given context, as well as how the syntactic and semantic differences correlate. 

There are some structural differences between the three constructions. The construction [VATTRN] 
is different from other two in that it can contain any verb, active or stative, preserving grammatical 
markers such as TAM and negation, e.g., séin-0é-ATTR jwe? (‘green-still-ATTR leaf?) ‘leaf which is still 
green’. Syntactically, this construction is clearly clausal, and semantically transparent. In the 
constructions [NV] and [NaV] only stative verbs can occur, and only in isolated form without any verbal 
modifiers. They are more similar to European attributive adjective expressions, and it is their syntactic 
status and semantic range that is the topic of analysis in the following. 

We now turn to the syntactic features underlying [NV] and [NaV] constructions. 


3.2 [NV] and [NaV] - background 
As seen above in 2.1, nouns can be modified by other nouns, either in attributive/associative or 
possessive relationship. The latter is usually indicated morphologically by the creaky tone on the last 
syllable of the possessor, if the syllable is creakable, although with non-human possessors, the induced 
creaky tone is rather rare, neutralizing the formal distinction between attributive and possessive 
relations. Thus, the pattern [N1N2] can encode a nominal head with another noun as its modifier. In 
this case it is invariably N2 that functions as head of the compounds, N1 being the modifier. Other 
possibilities are appositional collocations and additive compounds. When analyzing the [VaN] 
constructions, where [aV] is taken to be a noun, these possibilities have to be taken into consideration. 
Verbs, both active and stative, can take the general nominalizing prefix ?a-, resulting in any of 
several possible nominal readings. The semantics of a given [aV] as event or participant nominalization 
depends on the semantics of the verb as well as the concrete context, and in many cases different 
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meanings apply in different contexts, as illustrated above in 2.2.2. This semantic variation of [aV] 
nominalizations is an important factor to be counted in when analyzing the syntactic and semantic 
structure of [NaV] expressions. Despite the wide range of possible functions of [aV] forms, they can all 
be analyzed as underlyingly nominal, with different contextual readings, including adverbial and 
attributive. 

The third point that is relevant to the discussion is the fact that the nominalizing prefix 7a- is 
dropped in certain contexts, such as compounds and NUMERAL-CLASSIFIER combinations, where the 
classifier regularly loses the prefix: Pai ‘fruit’ (from the verb i ‘to bear fruit’), but Oaje?-Gi ‘mango’ 
(lit. ‘mango-fruit’) (Okell & Allott 2001:256). Furthermore, there is the tendency in colloquial Burmese, 
both spoken and written in informal contexts such as chats on social media, to drop the nominalizing 
prefix 7a-, resulting in bare verbs appearing in the place of nominalized verbs. The following examples 
are taken from personal informal chats on messengers: s3-ji t'd-me. ‘I will get up early.’ (lit. ‘be.early- 
be.big to.get.up-FUT’) for standard ?as3-ji t’d-me, where sd ‘be early’ is nominalized (lit. “big earliness’). 
This drop of the prefix in colloquial Burmese is especially frequent in two-part compounds of 
nominlized verbs, where the second part appears as bare stem, as in ?asi-wé ‘meeting’ (from the verb 
si-wé ‘to gather, meet’) for standard Pasi-Pawe. A quick Google search returns 2,7 million results for 
the full form Pasi-Pawe ‘meeting’, and 210,000 for the reduced form ?asi-we, the former pointing to 
official announcements and state media, the latter mostly to Facebook and other informal media. 


3.3 Semantic range of [NV] and [NaV] 

The constructions of the [NV] type are peculiar in that quite often they represent lexicalized units, some 
of which are listed as distinct words with idiosyncratic meanings, e.g.: lu-bje? ‘clown’ (lit. ‘man- 
to.joke’), Qu-zéin (dazéin)/lu-zein ‘stranger’ (lit. ‘person/man-be.green/fresh’), and @an-bju ‘tin’ (lit. 
‘jron-be.white’). The degree of lexicalization is rather subjective and variable among speakers. Thus, 
we can assume that the compilers of Burmese dictionaries include in the dictionary those [NV] 
constructions which they think have become lexicalized, but in different dictionaries there are likely to 
be different approaches to the same constructions. For example, in the Burmese-Russian Dictionary 
(Minina & U Kyo Zaw 1976) we find such entries as: lu-bu ‘dwarf? (lit. “man-be.short’), lu-wd ‘fat 
person’ (lit. ‘man-be.fat/full’), both of which are absent from the Burmese-English dictionary 
(Myanmar Language Commission 1976). On the other hand, the Burmese-Russian Dictionary omits the 
following entries which are present in the Burmese-English Dictionary: /u-/u? ‘unmarried person, 
person at large’ (lit. ‘man-be.free’) and /u-ye ‘youth’ (lit. ‘man-be.young/small’). Such examples can 
be multiplied and show the uncertain lexical status of these compounds. This suggests that [NV] is not 
restricted to fully conventionalized or lexicalized expressions, but speakers are rather free to employ 
the construction in productive ways, and we can assume that every individual speaker has his or her 
own conception as to which of the [NV] constructions got lexicalized to the extent that the parts of the 
construction lost their original meaning and therefore may not express the meaning the speaker intends 
to convey. These individual differences of how lexicalized (and semantically idiosyncratic) a given [NV] 
expression is seen may be one of the reasons why speakers choose one or another construction in a 
concrete speech context. 


4 The syntax of [NaV] and [NV] 


4.1 The syntax of [NaV] 

We now proceed to the syntax of the [NaV] construction. It is a rather infrequent in the formal language, 
used mostly in colloquial style, but is mentioned in all relevant descriptions of Burmese. The problem 
with this construction is that the component [aV], as seen above, is formally a noun and therefore 
according to basic syntactic rules should be treated as a head within the construction [N1N2]. 
Syntactically, N1 should be treated as a modifier to N2, as in other [N1N2] compounds. Still, in all 
relevant publications [aV] is treated as an attribute to preceding noun. Semantically, this treatment is 
perfectly logical. Indeed, in such constructions as ?ein-Pap'ju ‘a/the white house’ (‘house- 
NMLZ.be.white’), mi-Pani ‘a/the red light’ (‘fire-NML.be.red’) or yd-?aci ‘a/the big fish’ (‘fish- 
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NML.be.big’) it seems natural to interpret the second component of the constructions as attribute to the 
preceding one as the translations suggest, although this interpretation contradicts basic head-final 
syntactic rule of Burmese. Contrary to this view, C. F. Lehman (p.c.) insisted that [aV] is to be treated 
as a noun in all cases, and an expression like /u-?aci ‘a/the big man’ must be analyzed as ‘a person-type 
bigness’, that is, with N modifying the [aV]-head. Although this seems semantically counterintuitive, it 
is syntactically sound. 

Adding to the difficulty is the presence of clearly attributive [aV] in prenominal position [aVN], 
as in Pamjan-lan ‘expressway’ (‘NML.be.fast-road’), Pamjan-jat’a ‘express train’ (‘NML.be.fast-train’), 
and Pani-jaun ‘red’ (“NML.be.red-color’). In these expressions, there is no problem in identifying the 
prenominal [aV] as a modifying abstract noun, as in “speed-road ~ road of speed’, ‘speed-train ~ train 
of/with speed’, and ‘redness-color ~ color of redness’, respectively. The order is fixed in these cases, 
that is, there is no */an-Pamjan, *jat'a-Pamjan, or jaun-?ani, which suggests that these constructions are 
not relevant to the development of [NaV] expressions even if they are also to be analyzed as containing 
a nominalized verb modifying the noun. 

Supporting evidence of [aV] to be taken as the head of [NaV] seems to be the fact that in [NaV] 
constructions the N can be omitted in all contexts without the expression changing its meaning. If we 
accept the rule that syntactic heads are the parts of a phrase that cannot be omitted, N should be 
considered a dependent, rather than the head. If in the following example (1a) 7ein ‘house’ is the head, 
(1b) should be ungrammatical, though it is perfectly grammatical and idiomatic if the noun ‘house’ is 
present in the speech context. Note that the English translation in this case requires the dummy head 


‘ 3 


one. 


la. Ou Pein PabiP staur-ne-de. 
3. house NML.new” to.build-stay-NFUT 
‘He is building a new house.’ 


1b. Ou Padi? s'aur-ne-de. 
3. NML.new to.build-stay-NFUT 
‘He is building a new one.’ 


Of course, such transformation presupposes a context, from which the ‘reduced’ form of the utterance 
appears. But the form [aV] can occur on its own without any concrete context, as in (2). 


2. Ou Pabi? cair-te. 
3. NML.new | to.like-NFUT 
‘He likes new stuff.’ 


Anoun may also take two or more [aV] expressions at the same time, all of them, having the same 
referent, as in example (3) taken from modern Burmese prose written in rather colloquial style.13 


3. cand-go lapte?-je — Pacto-jo Pap'an-jo tair-te. 
lsm.DEP-OBJ tea-liquid NML.be.sweet-ENUM NML.be.acrid-ENUM  give.to.drink-NFUT 
‘He gave me tea, both sweet and strong.’ 


Thus, we may draw the preliminary conclusion that the unit [aV] exhibits all relevant features of a noun 
but treating it as a head of the construction [NaV], which seems natural for the nouns in this position, 
goes against semantic logic. Let us therefore further investigate the syntactic nature of [NaV] 


2 The lexeme Gi? hardly occurs in predicative function ‘be new/fresh’ but seems to be restricted to attributive 
function. Therefore, we prefer the gloss ‘new’ over ‘be.new’ in this case. 
3 Ludu U Hla. 2016. cana hile dayi. [I am the captain]. 2nd edition. Mandalay: Lu Du, 258. 
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expressions. It is necessary to distinguish between the several types of the [NaV] constructions. One 
type is obviously with a possessive relation between N1 and N2. This type of construction does not 
pose any syntactic or semantic problems, and the interpretation is straightforward. This is not the case 
in [NaV] expressions with modifying function, which present problems in interpreting their semantics. 
As the surface structure of the different [NaV] expressions is identical, ambiguity can arise without 
context, as in the following examples (4) and (5) adapted from Okell (1969:92f). 


4. k'oun  ?amjin behna-pe le. 
bench NML.be.high how.many-foot CQ 
‘How many feet high is the bench?’ 


5. k'oun  Pamjin ta-loun _lo-jin-de. 
bench NML.be.high one-CLF  require-DES-NFUT 
‘(He) wants a high bench.’ 


There is no problem with the first example as a possessive construction which could be overtly marked 
as such by the possessive marker jé. In the second one, [aV] obviously has attributive meaning. Let us 
consider another example (6). 


6. Ou Pein Paci s'aur-ne-de. 
3. house NML.be.big — to.build-stay-NFUT 
‘He is building a big house.’ 


As it stands, this sentence can also be interpreted differently, taking Paci as independent from Pein, 
having adverbial or secondary predicative (DEPICTIVE) function, ‘he is building the house big’ (see 
Himmelmann & Schultze-Berndt 2005).'* The difference becomes overt if the quantifier phrase ta-loun 
“one-CLF’ is added. The [NUM-CLF] phrase occurs directly after the noun Pein in depictive contexts, but 
after [NaV] in attributive contexts, as in (7a) and (7b). 


Ja. Ou Pein ta-loun Paci s'auP-ne-de. 
3 house  one-CLF NML.be.big — to.build-stay-NFUT 
‘He is building a house, and he’s making it big.’ 


7b. Ou Pein Paci ta-loun staur-ne-de. 
3 house nml.be.big —_ one-clf to.build-stay-nfut 
‘He is building a big house.’ 


Two other revealing examples are given in (8) and (9). 


8. kozd-go Pabir ta-k"u ktin-de. 
carpet-OBJ NML.new one-CLF  to.spread.out-NFUT 
‘He changed the carpet for a new one.’ 


9. mje?.hna-Qour-pawa-go abi? ta-tle lé-de. 
face-to.wipe-towel-OBJ NML.new one-CLF  to.exchange-NFUT 
‘He changed the towel for a new one.’ 


These examples are relevant to the present discussion in two points. First, the [aV] expression 
(‘NML.new’) is here separated from the noun ‘carpet’ by the object marker -ko/go. This leads to the 


'4 A similar function is seen in [aV] expressions like awa sd ‘eat one’s fill’ (lit. ‘NML.be.full to.eat’), Papi lou? 
‘to complete, finish’ (lit. ‘NML.be.finished to.do’), and ?as3 pjan ‘return early’ (lit. ‘NML.be.early to.return’). 
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conclusion that the sequences kaz3 Padir ‘new carpet’ and mje?.hna-Oour-pawa rafi? ‘new towel’ do 
not constitute a syntactic unit (constituent) which may be called noun phrase. Second, the component 
?aGi? (‘NML.new’) in both cases unambiguously functions as noun since it is followed by the numeral- 
classifier ta-k'i and ta-t’e, respectively. Unlike Paci ‘NML.be.big’ in (7) above, ?ai? cannot be 
interpreted here as depictive. Rather, this example shows that [aV] has some degree of syntactic 
independence of N even if it has attributive (modifying or specifying) function. This independence is 
further shown by the fact that the object marker -ko/go can appear either after the noun or after the N 
or after [aV] (with the [NUM-CLF] phrase), as seen in (10) and (11). 


10. kozd abi? ta-k'ui-go k'iln-de. 
carpet NML.new one-CLF-OBJ  to.spread.out-NFUT 
‘He changed the carpet for a new one.’ 


11. mje?.hna-Oour-pawa  ?aGiP ta-t'e-go lé-de. 
face-to.wipe-towel | NML.new one-CLF-OBJ to.exchange-NFUT 
‘He changed the towel for a new one.’ 
A further piece of evidence for the syntactic independence of [aV] in [NaV] constructions comes from 
expressions where the verb is negated. As the verbal prefix slot is occupied by the nominal prefix ?a- 
in [NaV], the negator ma- is blocked from this position. The alternative nominalizing strategy with 
suffixed -ta/da, commonly used to nominalize clauses, occurs in this context, as seen in (12). 


12. je Pare ta-loun, ma-?e-da ta-loun _ pe-ba. 
water NML.be.cold one-CLF NEG-be.cold-NFUT.NML one-CLF  to.give-IMPORT 
‘Give me one bottle of cold water, one not cold.’ 


The parallel occurrence of 7a? ‘cold one’ and ma-?é-da ‘not cold one’, both specifying je ‘water’, show 
that they are not syntactically speaking modifiers (i.e., dependents) of the nominal head, but heads 
themselves. The clausal nominalization suffix -ta/da is a merged form of the attributive suffix -té/dé 
and a dummy noun ha ‘thing’ (Yanson 2005:233f). This type of nominal expression never occurs as a 
modifier of another noun, unlike [aV], which in prenominal position is best analyzed as modifier of a 
following (but not of a preceding!) noun, as in ?amjan-lan ‘expressway’ seen above. 

This leads to the conclusion that [aV] is a syntactically free element in a clause, rather than a bound 
part of a [NaV] construction. Semantically, it may be more or less closely bound to the N, or it may 
have adverbial or depictive function in the clause. Following this argumentation, we may say that [NaV] 
constructions are not NPs in which either N or [aV] is the head, the other the dependent, but rather that 
both are independent syntactic units. This means that, unlike in Lehman’s analysis presented above, 
[NaV] as in Pein Paci ‘a/the big house’ should not be interpreted as ‘house-like bigness’, but rather as 
‘house, a big one’. N and [aV] are thus seen as standing in an APPOSITIONAL relation, rather than a 
MODIFIER-MODIFIED relation. This analysis is perfectly consistent with the wide range of meanings of 
[aV] derivations, as seen above in 2.2.2. It also explains the fact that the choice of classifier is triggered 
by N, rather than by an abstract [aV], as seen in examples (4) and (5) above. If Pamjin refers to the 
height of the bench, the appropriate quantifier complement is the measure word pe ‘foot’, but if the 
same [aV] expression functions as modifier to ‘bench’, the classifier oun (for counting round objects, 
fruit, pieces of furniture, etc.) must be used with the numeral. The analysis of [NaV] as an appositional 
expression ‘X, a/the Y one’ also explains the possibility to drop N and have only [aV] without change 
in meaning. Furthermore, appositional expressions may become more tightly bound in some cases, 
resulting in compounds that behave like lexicalized items. The change may be gradual and affect only 
some, usually very frequent, collocations. This process can be seen in kinship terms combined with 
proper names, which may become lexicalized into a single intonation unit, as in English ‘my uncle, 
John’, becoming ‘my Uncle John’. In Burmese, the appositional expression do s“aja p"e win ‘our teacher, 
Hpe Win’, with an intonational break between s’aja and p'e win may be compared to the more closely 
bound s'aja p'e win ‘Teacher Hpe Win’ with no intonational break. The latter is still less tightly 
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integrated phonologically than an [NV] expression like s’aja-ji ‘headmaster, big teacher’ (‘teacher - 
be.big’), as seen by the different voicing behavior. 

To sum up the syntactic structure of [NaV] expressions, we propose that [aV] is a syntactically 
independent nominal form relating semantically in different ways with N (as attributive or depictive) 
or V (as adverbial). In attributive function in [NaV] expressions, it is formally an apposition to N, with 
which it may become more closely bound in some cases. 


4.2 The syntax of [NV] 

As seen in 2.2.3 above, [NV] expressions can have several functions, some with rather idiosyncratic 
semantics. Here we only consider [NV] noun phrases where V modifies N. Unlike [NaV], [NV] 
constructions do not show any overt morphosyntactic linker between the noun and the modifying verb. 
Semantically, as in [NaV], the expression is a noun, with the postnominal verb as a modifier. Therefore, 
the question of the internal structure arises, similarly to the [NaV] constructions. Again, if N is the head 
and V the modifying dependent, the structure violates the basic head-final rule of Burmese as seen in 
[N1N2] constructions. Unlike [NaV], though, [NV] can hardly be analyzed as symmetric appositional 
construction, as the two components are not of the same category and cannot be taken as coreferential. 
The only hint of syntactic boundness in [NV] expressions is the fact that V is generally voiced in these 
constructions, showing it to be in close “juncture” with N (Okell 1969:12). As mentioned above, voicing 
is not always a sure diagnostic for syntactic relations, as, firstly, it is not indicated in the orthography, 
and, secondly, there is possible regional and individual variation in many cases. On the other hand, 
standard dictionaries do indicate voicing in the pronunciation (e.g., Myanmar Language Commission 
1976, San Lwin 2010), so at least for the [NV] expressions listed in the dictionaries, there is safe 
evidence for the role of voicing. The close morphosyntactic boundness indicated by voicing in [NV] 
expressions generally corresponds to their semantic closeness. 

There are two possible ways to explain the internal structure of closely bound [NV]. First, one may 
take them as derived from fuller [NaV] expressions, with the nominalizer 7a- dropped. We have seen 
above that this omission of ?a- occurs frequently in the informal language. The development would then 
be from appositional [N]+[aV] to attributive [NaV] and compound [NV], the latest step commonly 
occurring in expressions with conventionalized meaning. If this path of development is correct, we 
might expect several in-between cases, that is, variation of [NaV] and [NV], the latter possibly with 
varying voicing. 

The other possible explanation of [NV] attributive expressions is to take V as direct verbal modifier, 
indicating an inherent or permanent feature of N, which explains the close boundness. The position of 
adjectival modifiers with respect to their nominal head is variable in many languages across the world 
and does not necessarily correspond with the general headedness of the language, allowing for 
asymmetric orders (Dryer 2013). The different orders may have different historical reasons in different 
languages, and they may be indicative of different degrees of lexicalization. 

Both analyses are plausible and with no obvious counterevidence, it is possible that both are 
adequate for different [NV] expressions, with not a single development leading to all cases in the present 
language. The available diachronic data suggests that [NV] is the earliest pattern of modified nouns, 
with [NaV] appearing only later.'!° Together with the idiosyncratic semantic development of at least 
some [NV] expressions, this makes a strong case for [NV] actually being an old pattern in the language. 
The synchronic data from colloquial Burmese on the other hand show how the pattern can arise 
spontaneously in the present-day language. We may therefore conclude that two distinct paths of 
development merge in modern Burmese [NV] expressions, one from adjectival function of V in original 
[NV] compounds, the other from shortening of [NaV] to [NV] in non-formal speech. 

The question that remains to be answered is when and why [NV] and [NaV] are chosen by a speaker. 
This will be discussed in the next section. 


‘5 The inscriptional data of Burmese must be taken with care, as only a very specific (high) genre is represented 
in writing until very recently. The absence of a construction from the data does not necessarily mean the 
construction did not exist in the language at the time. 


171 


Papers from SEALS 30 — Jenny 


5. Factors triggering the choice of [VattrN], [NV], and [NaV] 

With three noun-modifier constructions that are treated as synonyms by most grammatical descriptions, 
the factors triggering the choice of one over the others are not obvious. Taking the premise for granted 
that difference in form entails difference in meaning/function, however subtle, we should be able to 
detect the differences in the semantics of the three forms from their distributional behavior. 

The case of [VATTRN] seems to be the clearest, as this construction allows for all kinds of 
grammatical markers, making the modifying part syntactically and semantically more independent and 
transparent than in the other two types. [VATTRN] constructions are the most common and are adequate 
in situations where a specific referent is described, modified, or specified. Verbal modifiers can be 
added only in [VATTRN] constructions, as in ?alun mjin-09 tai? ‘the/a very tall building’ (‘very be.high- 
ATTR building’) and séin-dé-05 ?ajwe? ‘the/a still green leaf? (‘be.green-yet-ATTR leaf’). 

With the possibility of adding aspectual and other verbal modifiers, [VATTR] does not necessarily 
refer to permanent or inherent qualities of N but may rather express momentary attributes. In formal 
Burmese, [VATTRN] is the default construction, corresponding to both relative clauses and attributive 
adjectives in English. The N modified by [VATTR] may be either specific or non-specific, corresponding 
to English definite or indefinite article constructions, depending on the context. [VATTRN] may 
therefore be seen as default choice, which is theoretically always possible, but not conversationally the 
most adequate choice in some contexts. 

The construction [VATTRN] is usually used in situations where the speaker wants to point out a 
specific quality of a specific referent or group of referents, often in direct opposition to another possible 
quality, as in (13). 


13. ci-dé ya ma-jaun-bu. 
be.big-ATTR fish NEG-to.sell-NEG 
‘I don’t sell the big fish (but I do sell the small ones).’ 


A customer in the market chooses fish and points on the big one, on some reason the seller does not 
want to sell this particular fish. Using the construction [NV] ya-ji (‘fish-big’) in this situation would 
mean that the seller does not sell big fish in general, just small ones. Less clear is the difference with 
[NaV], which might be interpreted that the seller is talking about a specific big fish or the fish he sells 
in general. 

In (14), the conventionalized expressions /u-zo “bad person, bandit, villain’ is used with its specific 
meaning, while in the second clause the more transparent [VATTRN] kdaun-dé lu ‘a/the man who is good’ 
appears instead of Ju-gaun, which is a lexicalized expression meaning a “dignified person’, rather than 
just ordinary good man. In this situation it would sound too elevated. 


14. lu-zo-dwe-né ma-s"e?.S"an-né, kaun-dé lu-dwe-né-bé. 
man-bad-PL-with NEG-to.associate-PROH be.good-ATTR man-PL-with-EXCL 
ste?.s"an po. 
to.associate INSIST 


‘Don’t associate with scoundrels, associate with good people.’ 


Similarly, in (15), the [VATTRN] construction is appropriate in a concrete context where the mother 
warns her child not to drink water that is hot at the moment, rather than referring to hot water in general 
which the child should avoid. The latter would be expressed by [NV] as je-bu (‘water-hot’), rather than 
[VATTRN]. 


15. pu-dé Je ma-@aur-neé. 
be.hot-ATTR water NEG-to.drink-PROH 
‘Don’t drink (the) hot water.’ 
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Sentence (16) is appropriate in a situation where a person has a choice of cars, but the one he usually 
drives is under repair, so he decides to drive the red one instead. 


16. di-lo s"o-jin na ni-dé ka-bé si-me. 
PROX-as_ to.say-if 1s be.red-ATTR car-EXCL  to.ride-FUT 
‘In this case, I will take the red car.’ 


The construction [NV] ka-ni (‘car-be.red’) would be proper in the situation that a person is dreaming 
about a high post and the privileges that come with it, one being a ‘red car’ as a recognized symbol of 
prosperity. Not all possible [NV] or [NaV] collocations do occur in the language which also affects the 
choice of type of the construction to be used in concrete context. For example, the following 
constructions do not exist, */u-mjan (‘man-quick’) or */u-zd (‘man-early’) although they have typical 
[NV] structure and seem semantically possible. In both cases, only [VATTRN] is available to represent 
the meaning ‘the/a fast person’ and ‘the/an early comer’, respectively. 

While the semantic difference between [VATTRN] and [NV] is clear in most cases, the same cannot 
be said for [VATTRN] and [NaV], nor for [NV] and [NaV]. In most examples above, [VATTRN] could 
be substituted by [NaV] without obvious change of meaning, though the former tends to sound more 
formal than the latter. Based on the distribution of [NV] and NaV] seen so far, it is possible to 
provisionally hypothesize, that the constructions of the [NaV] type are used in situations, when the 
speaker considers the construction [NV] to be lexicalized, and as such, its components have lost their 
direct meaning, which the speaker wants to express. Compare the following expressions in (17a-b) and 
(18a-b). 


17a. Ou fein Paci s"aur-ne-de. 
3. house NMLbe.big  to.build-stay-NFUT 
‘He is building a big house.’ 


17b. Ou ?ein-ji s"aur-ne-de. 
3 house-big to.build-stay-NFUT 
‘He is building a big house.’ 


18a. ka-rahaun ka-radie jaun-we-}in 
car-NML.be.old car-NML.new _ to.sell-to.buy-NML 
‘sale of used and new cars’ 


18b. ka-haun ka-6ir ?ajauN-rawe. 
car-be.old car-new NML.to.sell-NML.to.buy 
‘sale of used and new cars’ 


In both cases, the [NV] patterns (Pein-ji and ka-hdaun/ka-@i?) are not seen by the speaker as having 
idiosyncratic (lexicalized) meaning, so they are free to choose either form.'® A different situation can 
be seen in the following examples: Oa Paci ‘the/an elder son’ (‘son-NML.be.big’), Jami Pane “‘the/a 
younger daughter’ (‘daughter-NML.be.young/small’). The speaker uses the construction [NaV] instead 
of the seemingly natural [NV] for the obvious reason that the construction [NV] would be interpreted 
as ‘big son’ and ‘small daughter’, respectively, that is with a specific kinship meaning, rather than as a 
descriptive attribute. The latter are comparable to proper nouns, mainly used when addressing one’s 
children. This means that [NV] is blocked where there is an [NV] expression with a conventionalized 
meaning, but it is possibly available where no lexicalized [NV] exist in the speaker’s lexical inventory. 


‘6 The choice of [NV] and [NaV] in examples 18a and 18b may be triggered by the form of the nominalizer of 
the activity verbs, avoiding non-parallel ?a-nominalizations in the same expression (*kd-Pahaun ka-PaGi? 
ajadun-?awe). 
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Another relevant example is the difference between ka-ji and ka ?aci ‘big car’, as well as between 
ka-ye and ka Paye ‘small car’. The forms ka-ji and ka-ye are used on signboards at toll gates in Myanmar, 
referring generally to ‘big cars’ (i.e., ‘trucks and buses’) and ‘small cars’ (1.e., personal cars), 
respectively, as distinct categories of vehicles. The [NaV] constructions are used when talking about 
big and small specimens of the general category ‘car’. Again, the compounds ka-ji and kd-ye have 
specific meaning and can be taken as lexicalized, although they do not necessarily appear in dictionaries. 

The following pairs of expressions further illustrate the semantic difference between [NV] and 
[NaV]. 


[NV] gloss [NaV] gloss Components 

lu-ji ‘adult’ lu Paci ‘big person’ lu ‘man’, ci ‘be big’ 

lu-ne ‘youth’ lu Payne “young person’ lu ‘man’, ye “be 
young/small’ 

mjo-haun ‘ancient city’ mjo Pahaun ‘old city, former city’ majo ‘city’, haun ‘be old’ 

je-nwe ‘hot green tea’ je Panwé ‘hot/warm water’ je ‘water’, nwe ‘be hot, 
warm’ 

le?-ei? ‘copy’ le? Paéi? “new hand’ Gi? (not éi?) 

le?-haun ‘original’ le? Pahaun ‘old hand’ le? ‘hand’, haun ‘be old’ 

hin-jo ‘soup, broth’ hin Pacto ‘mild, sweet curry’ hin ‘curry’, c’o “be sweet’ 


The last example is especially telling, as it is not uncommon to hear orders in restaurants like hin-jo 
Pasar ‘hot, spicy soup’, where there is an apparent semantic clash between the opposites c’o ‘be sweet’ 
and sa? ‘be hot, spicy’. 

From the examples given above, it appears that generally [NV] are lexicalized compounds with 
specific meaning, functioning in some cases like proper nouns (@a-ji ‘older son’), categories (ka-ji ‘big 
vehicles, trucks’), or have idiosyncratic meaning (hin-jo ‘soup, broth’). In this, [NV] expressions are 
not fully productive, though there is individual variation in terms of what is acceptable as [NV]. [NaV] 
on the other hand is an open construction that can be used in a wide range of contexts when referring to 
a specimen with the described qualities. [NaV] is the colloquial counterpart of the formal [VATTRN] 
construction, which in colloquial style occurs mostly when the V is modified in some way. Having said 
this, we have to accept the fact that colloquially [NaV] expressions may become [NV] spontaneously 
by dropping the nominal prefix ?a-, as happens frequently in informal speech and writing. This 
shortening of [NaV] to [NV] may or may not show voicing of V, depending on the speech style and 
speed, and the shortening may be blocked if there is a clearly lexicalized [NV] expression with 
conventionalized meaning and the speaker consciously wants to avoid ambiguity. 


6 Conclusion and outlook 

We can conclude that the three [NOUN-ATTRIBUTE] constructions investigated in this paper, while being 
almost synonyms, exhibit important differences in their syntax and contextual semantics. The most 
transparent type, [VATTRN], is the default construction, especially in the formal and literary language, 
to combine a noun with any kind of verbal attribute. [NV] constructions in the literary language are 
always lexicalized compounds with specific meaning. If the verbal attribute of a noun is not further 
modified or specified by aspectual or other markers, the colloquial style prefers the construction [NaV], 
where [aV] is syntactically a nominalized verb, which can have a wide range of meanings. The most 
common meaning in the constructions discussed in this paper can be labeled ‘the/a X one’, referring to 
a participant, rather than to the quality itself (as in ‘X-ness’), which stand in appositional relation to the 
N. In the colloquial language, [NaV] may be reduced to [NV], except in cases where there is a 
conventionalized [NV] compound. 

The fact that formal and informal styles are not always clearly separate, and the styles may be 
mixed in many contexts leads to an overlap of [VATTRN] and [NaV] without obvious semantic 
difference between the two. On the other hand, the possible shortening of [NaV] to [NV] results in the 
two forms not being clearly distinguished in the colloquial style. 
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To further substantiate the analysis given in this paper, a larger corpus-based investigation covering 
different language styles should be made in the future. The actual use and distribution of the forms 
under discussion can only be verified by in-depth research in a wide range of textual genres, including 
spontaneous speech in informal settings. 
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Abstract 

This paper describes the morphosyntactic functions of the case marker -niu in Liangmai, a 
Tibeto-Burman language, spoken in the Northeast India. The agentive marker -niu is used 
optionally with both the subjects of transitive and intransitive clauses especially when the 
subject is acting in volition. We attempt to explain the difference between when -niu is 
omitted and when -niu is present. It also serves as an update to the previous reports on 
Liangmai case marking in Charengna (2014:399), Mataina (2018:18) and Daimai (2018). 


Keywords: Liangmai, case, agentive, volition 
ISO 639-3 code: njn 


1 Introduction 

In this paper, we will describe the system of case marking associated with -niu in Liangmai and specify 
its syntactic and pragmatic functions. Many Tibeto-Burman languages have been described to have a 
system of ‘optional’ or ‘pragmatic’ case marking (Coupe 2011, DeLancey 2011, LaPolla 1995, and 
Chelliah1997). The case marker -niu is also used optionally in Liangmai and it is found to marked both 
A and S arguments depending on pragmatic conditions. It is typically found on A(gent) arguments of 
transitive clauses and on the A arguments of a causative clauses formed with intransitive verbs (Daimai 
& Raguibou 2020:35). However, it is also reported to be omitted from a highly agentive A arguments 
and can even be used with non-agentive A/S(ubject) arguments (Mataina 2018:19). The marker -niu is 
usually absent in intransitive clauses. Earlier works in Liangmai case marking system by Charengna 
(2014), Mataina (2018) and Daimai (2018) have described -niu as nominative, agentive and ergative 
respectively. The paper is an update to earlier reports on the Liangmai case-marking system, including 
our SEALS XXX 2021 video presentation, and an attempt to gain more understanding in the case 
marking system of Liangmai. 


1.1, The People 

Liangmai is one of the Naga tribes living in India’s northeastern states of Manipur and Nagaland. In 
Manipur, they live in Tamenglong, Senapati and Kangpokpi districts, and in Nagaland, they are found 
in the Peren district. The total population of the Liangmai as per the census report of India 2011 is 
49,811. The Liangmai people were previously known by an incorrect ethnonym, and till recently, the 
term ‘Kacha Naga’ was used to refer to Liangmai and Zeme in Manipur. This misnomer was, however, 
officially done away in Manipur with the Constitution (Scheduled Tribes) Order (Amendment) Act in 
2011. The Liangmai in Nagaland, however, are still known as the Zeliang, along with the Zeme. The 
Liangmai share ethnic, cultural and linguistic affinity with Zeme and Rongmei. These three tribes are 
commonly referred to as Zeliangrong, a term formed by putting together the first syllables of the three 
tribes. They are believed to be descendants of a common ancestor. There are numerous cognate root 
words shared among their languages as well. They live in a contiguous area spreading across Assam, 
Manipur and Nagaland. Bradley (1997) also used the term ‘Zeliangrong’ as a subgroup within the 
Southern Naga of the Kuki-Chin-Naga branch. 
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1.2. The Language 

In Grierson’s Linguistic Survey of India (1903), Liangmai, called Kwoireng or Liyang, was treated as 
belonging to the Naga-Kuki sub-group, though no description of the language was provided. More 
recently, Burling (2007) grouped Liangmai under his Zeme group along with Zeme and Rongmei. 


Map 1. North East India maps showing Liangmai inhabited area (adapted from 
https://www.mapsofindia.com/maps/northeast/sevensisters.htm ) 
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In our data elicitation, the basic constituent order found in pragmatically neutral declarative clauses is 
SOV. We rely on elicited examples because such examples tend to have all of the participants overtly 
realized, which is frequently not the case in natural discourse. In example (1), we find an intransitive 
construction with the third-person subject preceding the verb ‘laugh’. In example (2), we see the first- 
person agent preceding the object: the verb is in the clause-final position. When there are two objects, 
the primary object precedes the secondary object followed by the verb, and the primary object is overtly 
marked. In example (3), the agent is marked with -niu, and the primary object is overtly marked with 
the dative morpheme -7i. 


1. pa nui-bam-mei 
1sG laugh-PROG-DECL 
‘He is laughing’ 

2. i tsalit tat-néi 
1SsG field go-IRR 
‘I will go to the field’ 
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3. nay-niu a-til rankan pi-jéi 
2SG-AGT 1SG-DAT money give-DECL 
“You give the money to me.’ 


Liangmai has a binary mood system contrasting a zero-marked realis with an overtly-marked irrealis 
mood. The unmarked realis mood corresponds to actualized events as in (3), while the irrealis suffixes 
-néi and -rabo is used to encode non-actualized events as in (2). The language is also marked for aspects, 
such as progressive as in (1) and perfective. As is commonly found in other Tibeto-Burman languages, 
the agentive marker -niu in Liangmai has the same form as the instrumental marker as illustrated in (4). 


4. pa-niu tati-tu sin-tay-niu dap-jéi 
2SG-AGT dog-DAT Wood-CL-INSTR beat-DECL 
‘He beat the dog with a stick’ 


1.3. Data sources 

There are dialectal differences in Liangmai but not to an extent that they cannot understand varieties of 
one another. We observed that in terms of understanding each other between the southern Liangmai 
village varieties like Thalon and Namtiram against the Northern Liangmai village varieties like Kuilong 
and Maguilong, the difference accounts to only a small amount. These differences are basically in 
accents and some lexical words. Each Liangmai village has a different accent. For instance, in the lexical 
domain, the Thalon village calls ‘uncooked rice’ as [k3ban], while the northern village variety uses the 
term [tsdban] for the same. There is no significant difference in the segmental phonology as per our 
observation. In this study, most of the sentence data were created by us as native speakers of Liangmai, 
and the spoken variety of these sentence data were based on the northern village varieties like Kuilong, 
Maguilong and Lemta and the southern varieties like Thalon. In addition, we also used sentences 
produced by Mataina (2018) and Daimai and Raguibou (2020) and cited them in the sentence examples 
wherever required. 


2 Overview of case markers in Liangmai 

We claim that there are ten case markers in Liangmai. Charengna (2014:398) reported nine case markers 
as reproduced in Table 2. Mataina (2018:18) reported ten case markers with a variation and one 
additional case marker called the terminative case from earlier reports, as listed in Table 1. The label or 
the terminology of case markers used in both the studies are partially different. 
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Table 1: Case markers in Liangmai (as reported in Mataina 2018:18) 


Case name Markers Semantic function 
Agentive niu agent; contrast 
Accusative/Dative tu affected patient / recipient 
Genitive gu possessor 
niu/ . 
Instrumental peep instrument 
k'uiluziu 
Comitative nai accompany 
Benefactive ley benefit 
Locative ga location 
asu . 
Ablative 8 : source, source of action 
ganiu 
Allative lam movement towards a place 
ae movement towards a 
Terminative katay ; 
time/place 


Table 2: Case markers in Liangmai (as reported in Charengna 2014:398) 


Case name Markers 
Nominative niu 
Accusative tu 
Instrumental niu 
Dative ley 
Locative ga~lam 
Genitive gu 
Ablative gasu/ganiu~lamsu 
Sociative nai 
Benefactive ley 


3 Usages of the marker -niw in Liangmai 

For the purpose of this paper, before we look at the role of the marker -niu, it would be useful to look 
at the four patterns of core case marking as given by Dixon (1979, 1994) and cited in Coupe (2007). 
Also, in this section, we discussed the various terminologies used to label the case marker -niu in earlier 
reports. The four patterns of core case marking that are common to the world languages are the 
following: 
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e Nominative-Accusative - The A argument and S arguments are marked in similar ways but 
differently from the O argument. 

e Ergative-absolutive - The O argument of a transitive clause and the S argument of an intransitive 
clause have similar marking, while the A argument is marked differently. 

e Split-S system - S arguments of ONE SET of intransitive verbs get a morphosyntactic treatment 
similar to the A argument of a transitive verb, while the S argument of ANOTHER SET of 
intransitive verbs gets a morphosyntactic treatment similar to O arguments of transitive clauses. 

e Fluid-S system - Similar to split-S (.e., having a set of intransitive verbs that marks S like A, 
and another set of intransitive verbs that marks S like O) except that in the fluid-S, the S 
argument gets case marking based on semantic factors. 


The position of the Liangmai agentive case marker in relation to the above four types given by Dixon 
varies. It does not completely follow any one of the above patterns strictly, but it seems to be following 
partially all the above patterns, and this can be seen from the examples discussed in the following 
sections. 

Charengna (2014) labelled -niu as a nominative-case marker. In Meitei, Bhat and Ningomba (1997) 
also labelled na as a nominative case marker, but for the same language, Chelliah (1997) considers it to 
be an agentive marker. In Liangmai, Mataina (2018) labelled the same marker as an agentive marker, 
while Daimai (2018) labelled the same marker an ergative marker. Following LaPolla (1995), Coupe 
(2007, 2011) and Teo (2012), we use the label ‘agentive’ in this paper. LaPolla uses the term agentive 
to refer to a subject argument that acts willfully or is done in volition, but he uses the term ergative 
when transitive marking is done systematically based on the syntactic pattern. Also, we do not label the 
marker -niu here as a nominative case because, as a nominative marker, it does not mark subject 
experiencer of A or S arguments. The nominative case is therefore not morphologically marked in 
Liangmai. As stated earlier, it may be noted that the agentive morpheme -niu is homophonous with an 
instrumental case marker in Liangmai. 


3.1 Commendation and accountability 

The marker -niu in Liangmai is used to essentially recognize the ‘doer’ as such to give a person a credit 
or a blame. Otherwise, in a pragmatically neutral situation, the marking of -niu is optional (Mataina 
2018:21) and Daimai (2018), as shown in (5). 


5. natsi-piu O marui kamsat-lu k’ai bam-méi, — nay ri-k'ai-lau 
2SG.POSS-brother-GEN-AGT hen  kill-PERF keep PROG-DECL, 2SG burn-PO-IMP 
“Your brother had killed a chicken and kept, you burn (it). (lit. Clean it by burning to cook). 


However, the agentive marker is mandatory in a pragmatic situation such as in a response to a question 
‘Who did it?’. This required usage is reported in Ao (Coupe, 2007). This is the context in which the 
agentive marker is obligatory. Example (6) is an interrogative sentence and (7) is an answer to the 
question of (6). In a context other than the content question, a speaker would insert the marker -niu if 
he or she wants to intentionally indicate the doer (to give a credit or a blame). 


6. sau (-niu) moarui = kamsat lau 
who (-AGT) chicken kill Q 
‘Who killed the chicken?’ 

T, lunt'snbau-niu! = marii_—skamsait-le. 


Lungthonbou-AGT chicken kill-DECL 
‘Lungthonbou killed the chicken (yesterday).’ 


! The tone of the agentive marker is generally influenced by the tone of the noun it marks (Mataina 2018) 
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In (6), it is normal to omit the agentive marker by the questioner. That does not necessarily but generally 
mean the questioner is not worried about who prepared the chicken for the family. However, inserting 
the agentive marker in such a context as (6) definitely implies that the questioner wants to identify who 
did the work. In a pragmatically neutral situation, the agentive is not required at all. This is demonstrated 
in a habitual statement in (8) and (9). 


8. indi matsay kabak-tu gii-jéi 
1s day-alway pig-ACC feed-DECL 
‘I feed the pig every day.’ 

9. a-kik"tin-lén tinkum matsdy  pastor-piu kak'éy — thiu-jéi 
1s-POSS-family-BEN year always _ pastor-male prayer do-DECL 


‘The pastor prays for our family every year.’ 


Coupe (2007:159) has shown a context of how an agentive marker was not required in Mongsen Ao, a 
Naga language, even when a questioner was inquiring what Coupe had done with the tea that had been 
made for him (as the tea was no more). Upon replying with the Ao example (10), Coupe was corrected 
by the questioner (his Ao language consultant) that it was wrong to use the agentive marker in that 
context. This is similarly true with Liangmai. However, it is also not wrong to use the agentive marker 
in such context in the case of Liangmai and can be optional, as in (11). Yet, our understanding is that 
the agentive marker -niu in Liangmai in particular and North East Indian languages like Mongsen Ao 
(Coupe, 2007) and Sumi (Teo, 2014) in general is used to indicate that the doer is pointed out clearly. 


10. ni na saya tfamuku (Mongsen Ao) 
ni no sana tfamu-uki 
1SG AGT tea drink-ANT 
‘lve drunk the tea.’ (Coupe 2007:159) 


11. i-(niu) tsa sak-lu-dei 
1SG (AGT) tea drink-PST PERF-DECL 
‘T’ve drunk the tea.’ 


3.2. Volition and power 

In earlier studies on North East Indian languages, among others, Bhat and Ningomba (1997) in Meetei, 
and Teo (2014) in Sumi reported that the agentive marker is used to indicate volition. We observe that 
in Liangmai as well the agentive marker is also used to indicate volition, and in addition, an ability and 
power. However, there are some exceptions to it which we will see in a moment. For an intransitive 
clause as in (12), the agentive marker is not used, but for the same example in (13), it is used to show 
the intention of the speaker. He intentionally coughs to get an attention or to distract someone. 


12. i makhiu-jei 
1sG cough-DECL 
‘I coughed. 

13. i-niu makhiu-jei 


1SG-AGT cough-DECL 
‘I coughed (intentionally). 


The agentive marker -niu is also used to indicate power or ability. It occurs, for instance, in a situation 


where no one is willing to perform certain tasks due to its challenging or dangerous nature, or a kind of 
work that people are too lazy to perform. Examples are shown in (14) through (18). 
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14. 3-na-niu sefai mai-tu 3lam-néi 
1SG-POSS-son-AGT army man-ACC contact-IRR 
‘My son will contact the army.’ 


15. i-niu tsakay-tsapiay kK*iu-néi 


1SG-AGT plate-cup wash-IRR 
‘I will do the dishes.’ 

16. zisu-niu sintsdi-ga sai-k"ai-jéi 
Jesus-AGT cross-PP die-POL-DECL 


‘Jesus died on the cross.’ 


17. kaktiu. = mai-niu kam-ra-bau gu-jei 
rich man-AGT do-PURP-NOM POSS-DECL 
‘This is (something) supposed to be done by a wealthy person.’ 


18. na-pdu karin-bau tin-hai, pd-niu tsakui = min-li-jéi 
2SG grandpa PRE-alive-NOM time-DEF 2SG PRE-lion catch-PST-PERF-DECL 
‘While your grandpa was alive, he (even) caught a lion.’ 


However, the claim that the marker -niuw in Liangmai is used to indicate volition and ability is 
complicated by (19). In (19), the subject experiencer is not indicating any volition to be sweating, but 
the agentive marker -niu is still used. Also, in (19), the agentive marker can be optional, but it is more 
natural to mark the subject. However, this must be taken as an exception because this type of occurrence 
is very rare. Chelliah (1997:124) also pointed out that in Meetei, the agentive marker na is easily found 
in other sentences that describe unintentional situations, which goes against Bhat and Ningomba’s 
(1997) claim that va is used to mark a volition . 


19. hdiga i-niu kasoyziu sdira kum bam-bau-gd, dekam tsami t'tuziu lau? 
hdi-ga i-niu kasoy-ziu sdai-ra 
PROX-LOC 1SG-AGT hot-COPM die-PURP 


kum-bam-bau-gda dekam  tsa-mi_ tiu-ziu lau? 
like-PROG-NOM-CONJ why NRL-fire make-CPM-QPTCL 
‘Here, I am about to die due to heat and why are (you) making fire?’ (Mataina 2018) 


3.3. Subject experiencer 

An agentive marker cannot be present when the A occurs as a subject experiencer and has no control 
over the action (Mataina 2018:19). In examples (20) to (22), all the actions are caused by a natural 
phenomenon or by an accident. 


20. 3 -wdn thiu-jei 
1s-stomach pain-DECL 
‘My stomach aches.’ 


21. luythanbau tsa-khdau-ga kali wan-mi-dei 


PN NRL-cleft-LOC roll g0-PERF-DECL 
‘Lungthonbou has fallen on the cleft.’ (Literally- rolled and went) 
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22. ta-diti zau-bam-mei 
NRL-water drip-PROG-DECL 
“Water is dripping (from the roof).’ 


In contrast, the agentive can be optional as well in a subject experiencer as shown in (23) in which the 
S argument is a subject experiencer. However, in this kind of construction, the speaker intends to point 
out the reason why someone is upset. Otherwise, in a normal context, the agentive -niu will not be 
required, as shown in (23). 


23. pa-niu na-tii —_—pa-tsun sa-bau ra, 
3S-AGT 2-ACC  38G-heart bad-NOM DEF 
nay maliu —mak-bau donniu-jéi 
28 talk NEG-NOM reason-DECL 


‘The reason he’s upset with you is because you are silent.’ 


24. pa na-th —pa-tsiin sa bam-mei 
3S-AGT 2-ACC  38G-heart bad PROG-DECL 
‘He is upset with you.’ 


There are certain types of intransitive verbs (i.e., those with S arguments-subject experiencer) that will 
not allow the agentive marker. Such verbs are kak/uin ‘shiver’, nk’ay ‘to fall’, kap ‘to cry’, and nui ‘to 
laugh’ as shown in (25). No specific reason is known as to why these verbs do not allow the agentive 
marker. One of the reasons could be that these verbs cause the subject argument to experience the action 
of the outside agent. However, if any of the actions created by such intransitive verbs can cause 
distraction or trouble to others, the agentive marker may be inserted. In (26), the questioner was 
disturbed by the noise of a cry the previous night, so he asked someone a content question as in (26). 
Someone responded that it was luythaybau who cried the previous night. Here, /uyth3ybau, the subject 
experiencer does not have a fault of his own in disturbing someone the previous night, but in order to 
get the blame or indicate the doer, the agentive marker -niu is used. This is related to the discussion 
noted in §3.2, particularly in the response statement to a query like ‘who did it?’. 


25. lunthsnbau kap-jéi 
Lungthonbou cry-DECL 
‘Lungthonbou cries.’ 


26. 7 naliu-tu kap-tu-lau tiu din-néi, sdu-niu kalutsin  kap-ziu-lau 
1s  2P-ACC cry-NEG-IMP CONJ  tell-DECL who-AGT last night cry-COMP-Q 
‘I told you (all) to not cry, who cried last night? (it disturbed me).’ 


27. lunthsnbau-niu — kap-jéi 
PN-AGT cry-DECL 
‘Lungthonbou cried (last night).’ 


3.4 Causative construction 

Additionally, marking A arguments looks more natural in causative sentences, but they can be still 
optional as shown in (28) and (29) where (28) is more natural. More examples on causative sentences 
are given in (30) and (31). 
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28. ape-niu katsa kdm-thiti-e 
grandma-AGT tea CAUS-hot-DECL 
“Grandma made the tea hot/warmed the tea.’ (Daimai and Raguibou 2020:38) 


29. ape-(niu) katsa — kam-thiti-jei 
grandma-AGT tea CAUS-hot-DECL 
‘Grandma made the tea hot/warmed the tea.’ 


30. i-niu lunsiliu-tu kam-nui-jei _ (transitive, causative clause) 
1sG-(AGT) —_Lungsiliu-ACC = CAUS-laugh-DECL 
‘I made Lungsiliu laugh.’ 


31. i-niu luysiliu-tu —_ppi-tsagan lay-jéi (ditransitive, causative clause) 
1SG AGT PN-ACC CAUS-curry cook-DECL 
‘I make Lungsiliu cook the curry.’ 


3.5 Inanimate marking 
The marker -niu also marks inanimate subjects. Such constructions with the agentive marker, as shown 
in (32) and (33), appear more natural against the unmarked inanimate subject. 


32. naimik-niu kabun = kam-nuy-midéi 
sun-AGT ice CAUS-melt-PERF-DECL 
‘The sun caused the ice to melt.’ 


33. kau wan-bau sin-ban-niu pa-til dap-jéi 
fall come-NOM wood-CL AGT 1S-ACC  hit-DECL 
‘The fallen tree has hit him’. 


3.6 Contrast or comparison 

The marker -niu is also used to indicate contrast or comparison, as shown in (34) and (35). In addition, 
the usage or non-usage of the agentive marker -niu seems to be suggesting a state of difference as shown 
in (36) and (37). The semantic difference between the two examples (36) and (37) cannot be 
distinguished easily by a native speaker. However, it seems to be indicating a difference where, (36) is 
in general warning someone to not venture out into the bushes because a snake or possibly any insect 
may bite him, while (37) is hinting that there is no other insect to harm him except the snake. 


34. hdi-si-niu raykayn maddi,  wiii-si-niu raykay maniu 


PROX-DET-CONTR money four PROX-DET-CONTR money five 
‘This is four rupees, that is five rupees.’ (Literally- this costs Rs.4 and that costs Rs. 5) 
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35. lunthonbau-niu __pa-lity kawi-pui nd, asemliu-niu  pa-lun 
PN-CONTR 3SG-heart good-(GEN), child PN-CONTR  3SG-heart 
kasd-pui nda-jei 
bad-(GEN) child-DECL 


‘Lunthonbou is the child of the compassionate mother, and Asemliu is the child of the cruel 
mother.’ (Mataina 2018). 


36. manam  hay-ga yut-tu-lau, kaniu- 9 matsau-né 
bushes under-LOC enter-NEG-IMP snake-(AGT) bite-IRR 
‘Do not enter into the bushes, the snake will bite (you).’ 


37. manam  hay-ga yut-tu-lau, kaniu-niu matsau-né 
bushes under-LOC enter-NEG-IMP snake-AGT bite-IRR 
‘Do not enter into the bushes, the snake will bite (you).’ 


4 Conclusion 

In the foregoing discussion, we attempted to identify the role of the case marker -niu in Liangmai. The 
agentive marker in Liangmai is generally optional. It is basically used to indicate the doer. The 
mandatory marking happens only in pragmatic contexts especially a content question. This response 
statement to a content question can be either a transitive or an intransitive clause. It can be either a past 
or present or future statement. It also used to indicate a volition or a power. However, there is an 
exception to this namely, indicating volition and power as seen in (18). But this occurrence is very 
infrequent, and therefore we consider it as an exception. However, further investigation on the usage of 
the agentive marker for volition will be an important future direction. 

In addition, the agentive marker -niu is also used to indicate a contrast or a comparison. In regard 
to a subject experiencer (i.e., when an A argument is a subject experiencer), there are two contexts 
where agentive marking is totally absent and also where agentive marking can be optional. In a 
causativized construction, agentive marking appears more natural, but again it can be optional. The 
highly optional nature of the agentive marker or its multi-functional role makes it difficult to make its 
definite role besides what have been discussed in this paper. This paper will hopefully make some 
contribution to our endeavor to understand and study the core case marking system in the Trans- 
Himalayan languages. 


Abbreviations 

COMP complimentizer 
DEF definite marker 
POL polite maker 
PURP purposive marker 
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Abstract 

In Kapampangan, there are two ways to form the perfective aspect: the infixation of <in> 
between the first consonant and first vowel (V1) of the stem (Del Corro 1980), and one 
involving changes in the V1 of stems. Previous studies described these changes as 
replacives <e:> and <i:>, i.e., they replace the V1 of the stem (Del Corro 1980:47-66; 
Forman 2019:65, 73). We argue that these changes echo Himes’ (2012) diphthong 
reduction and Del Corro’s (1980,1988) monophthongization. Thus, instead of having three 
replacive infixes <e:>, <i>, <i>, Kapampangan uses the infix <i>, which produces several 
forms based on the V1. 


Keywords: Kapampangan, morphophonemics, diphthong reduction, perfective, 
monophthongization 
ISO 639-3: pam 


1 Introduction 
Kapampangan (ISO 639-3 pam), a Central Luzon, Malayo-Polynesian language, forms the perfective 
aspect in several ways. One of these is using the infix <in>, as can be seen in (1). 


(1) ..linawe nakuy marok mata... 
l<in>awe=na=ku=n=marok mata 
PFV-look=GEN.3SG=ABS.1SG=Ik=angry eyes 
‘...£he looked at me with angry eyes...’ 


This morpheme is inserted between the first consonant and the first vowel (V1) of the stem. The use of 
<in> to mark the perfective as in Ilokano, or sometimes the inceptive aspect as in Tagalog and other 
Central Philippine languages is typical of Philippine languages. (Blust 2013:385-86) 

Besides the use of <in>, there are other ways described in previous studies to mark the perfective aspect 
in Kapampangan. Forman (2019) refers to this as vowel changes, an example of which is illustrated in 


(2). 


(2) Mintd ya Menila (Forman 2019:68). 
m-(p)(u)<i>nta=ya Menila 
INTR-<PFV>-to go=3SG.ABS Manila 
‘She went to Manila.’ 


(3) Mémipi ku ndpun (Forman 2019:69). 
m-(pa)<e>N-(p)ipi=ku napun 
INTR-STEM-<PFV>-to.do.laundry=1SG.ABS yesterday 
‘I did laundry yesterday (‘I did the wash yesterday.’) 


The perfective aspect is formed by changing the V1 of contemplative/infinitive forms. (2) and (3) have 
infinitive forms munta and mamipi, respectively. The V1 of the verb stems are replaced by another 
vowel; that is, [u] is replaced by [i] in (2), and [a] is replaced by [e] in (3) (Forman 2019:65). Other 
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studies of verb forms in Kapampangan (Bergafio 1916; Mirikitani 1971; Del Corro 1980) have similar 
positions, although they do not use the same term for the process. According to Mirikitani, the choice 
between <in> or the so-called vowel changes/difference in vowel quality depends on the phonological 
shape of the stem (1971:66), and Del Corro (1980:37-68) discussed in detail which affixes/processes 
are used in specific environments. Del Corro refers to the perfective morphemes in (2) and (3) as 
replacives. 


(4) [u] > [1] 
mugse ‘will throw’ — migse ‘threw’ (Del Corro 1980:40) 


[a] — [e] 


danuman ‘will water’ — de:numan ‘have watered’ (Del Corro 1980:57) 


(4) shows an example of the perfective replacives Del Corro mentioned. Similar to Forman’s 
description, replacives replace the V1 of the stem. For stems with [u] as its V1, the <i> perfective 
replacive is used, while <e> is generally used for stems with [a] as V1. Aside from the transformations 
mentioned above, there are also instances wherein <i> replaces [a]. 


(5) maldug ‘is dropping’ — mildug ‘dropped’ (Del Corro 1980:43) 


Bergafio also made the same observation but without identifying which vowel is used in replacing a 
specific vowel. (6) shows some examples of the conjugation of the perfective aspect found in Bergafio 
(1916:19, 44). 


(6) musngi ‘to open something’ — misngi ‘have opened something’ 
isalicut ‘to hide (something/somewhere) — selicut ‘have hidden (something/somewhere)’ 


For stems that have [i] as V1, the infix <in> may be used (Mirikitani 1971:67). An example is given in 
(7). In some stems, [i] is simply lengthened (Mirikitani 1971:66; Del Corro 1990:51) as in (8). 


(7) diligan ‘to water something’ — dinilig ‘to water something’ (Mirikitani 1971:67) 
(8) misakab ‘fall down/trip’ — misakab ‘fell down/tripped’ (Del Corro 1980:51) 


To summarize, Kapampangan uses several strategies to mark the perfective aspect. This can be found 
in (9). The specific environment wherein these affixes are used can be found in Del Corro (1980:37- 
66). 


(9) a. infix <in> 
b. replacives <i, e> 
c. lengthening of [1] 


The analyses discussed above seem at odds with the fact that Philippine languages usually mark the 
perfective aspect through infixation. In this paper, we will present an alternative analysis for the 
perfective aspect marking specifically in (9b) and (9c), wherein the aforementioned replacives and 
vowel lengthening are unified into a single infix <i>, taking into account known sound changes in 
Kapampangan, namely, diphthong reduction. 


2 Diphthong Reduction 

Diphthong reduction, also known as monophthongization (Del Corro 1980, 1988), is a phonological 
process wherein diphthongs in Kapampangan occurring in a closed syllable or word-final position are 
simplified (Himes 2012:499). Himes traces the changes from Proto-Malayo-Polynesian (PMP) to Proto- 
Central Luzon (PLuzC) to Pre-Kapampangan (Pre-KPM). 
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(10) *aj > [e:] 
PLuzC *?eRbun — Pre-KPM *aybun — [e:bun] 
PMP *beRni — Pre-KPM *bayni — [be:ni] 
PMP *balay — Pre-KPM *balay — [bale] 
(Himes 2012:499) 


(11) ~—*uj and *iw— [1?] 
PMP *ikuR — Pre-KPM *i:kuy — [i:ki?] ‘tail’ 
PMP *qapuR — Pre-KPM *a:puy — [?a:pi?] ‘lime’ 
PMP *saliw — Pre-KPM *sa:liw — [sa:li?] ‘to buy’ 
(Himes 2012:500) 


(10) shows the change Pre-KPM *aj — [e], while (11) shows the change Pre-KPM *uj, *iw — [i?]. The 
diphthongs *uj and “iw have the same pair of vowels ([u] and [i]) , albeit in different sequences. This 
may suggest that the vowel pairs and not the vowel sequences determine the resulting forms. Although 
not directly relevant to the current study, the diphthong [aw] also underwent reduction to [o] in 
Kapampangan (Himes 2012:500; Del Corro 1980:25), as shown in (12). 


(12) *aw— [po] 
PMP *babaw — Pre-KPM *ba:baw — [ba:bo] ‘above’ 
PMP *la:naw — Pre-KPM *la:naw — [la:no] ‘housefly’ 
(Himes 2012:500) 


Some of the Pre-Kapampangan forms Himes cited are actually attested in earlier documentation of 
Kapampangan such as those recorded by Bergafio (1860). (13) provides a few examples. 


(13) Pre-KPM *balay ‘house’ cf. Bergafio balay ‘house’ 
Pre-KPM *ba:baw ‘above’ cf. Bergafio babao ‘above’ 
Pre-KPM *la:naw cf. Bergafio langao, ‘housefly’ 


Other words bearing word-final diphthongs not mentioned by Himes (2012) and found in Bergafio are 
sablai ‘to hang clothes’, lacao ‘remove,’ and tacao ‘gluttony’. 

Although non-word-final diphthongs are unattested in Kapampangan, there is reason to believe 
that diphthongs in words like *aybun, *bayyi (Himes 2012:499) did exist at some point. As evidence, 
words like /elay ‘edge’ and tetay ‘bridge’, which are found in Bergafio (1860), have cognates in other 
Philippine languages that possess medial diphthongs (cf. Tagalog and Botolan Sambal /aylay ‘hem, 
edge’ and taytay ‘bridge’). This points to the possibility that diphthong reduction in Kapampangan came 
in two waves. The word-medial diphthongs were likely the first ones to undergo such change, followed 
by the second wave of diphthong reduction in word-final diphthongs. This initial wave may have been 
completed by the time Bergafio (1860) started his study of Kapampangan. 

Meanwhile, the second wave of diphthong reduction affecting word-final diphthongs was ongoing 
during Bergafio’s study. In Bergafio (1860), citation forms are spelled with diphthongs, and the 
pronunciation guides that follow the lemmas also mark them as such “(Dipt.)”. Some of the entries also 
provide sample sentences, and they display the utterance-middle diphthong reduction. 


(14) (a) BALAY (Dipt) Noun. House ... pibalebalay, those that are presented in 
pamibalebalay, every dowry, e.g., cattle ... (Bergafio 1860:38) 


(b) LACAO. (Dipt.) Irregular verb. ... Mamacao, mecao, macao, to go. Mi, N. Milaco ya 
ngeni (Bergafio 1860:124) 
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(c) TETAY (Dipt.) Noun. Bridge. ... tinetayan, the bridge in which one passes, or the 
river in which does so as well (Bergafio 1860:252) 


(d) TACAO (dipt.) Noun. Inappropriate desire to eat and drink; gluttony ... 


The sample sentences from each entry are given in (15); (15E) however, is taken from Bergafio (1916). 


(15) (a) 


(b) 


(c) 


(d) 


(e) 


Yian pibalebale co (Bergafio 1860:38) 
yan pi-balay~balay=ku 
ABS.MED PL-REDUP-house=1SG.GEN 
‘these are my dowry/trousseau’’ 


Milaco yangeni (Bergafio 1860:124) 


mi-lakaw=ya yeni 
INTR-going=3SG.ABS now 
‘he left today’ 


Mirugtungne tetay (Bergafio 1860:252) 
mi-dugtun=na=ya tetay 
INTR-linking=3SG.GEN=3SG.ABS bridge 
(metaph.) ‘he has come’ 


Mirugtungne ing tetena (Bergafio 1860:252) 
mi-dugtun=na=ya in=tetay=na 
INTR-linking=3SG.GEN=3SG.ABS SG.ABS=bridge=3SG.GEN 
(metaph.) ‘he has come’ 


Dacpan me ing mapanacao (Bergaiio 1916:52) 


dakap-an=mu=ya in=mapanakaw 
apprehending-TR=2SG.GEN=3SG.ABS SG.ABS=thief 
‘catch the thief’ 


The appended forms of the roots in the sentences in (15b) and (15d) are found in medial position: they 
are followed by clitic pronouns. Notice, however, that the citation forms and other derivations and 
inflections have diphthongs, as in each entry: pibalaybalay ‘dowry, trousseau’, panlacao ‘was 
removed’, and tinetayan ‘bridge that was crossed’. Utterance-final diphthongs also remain as is. This 
is shown in sentences like (15c) and (15e). When they are found in the medial position, they are reduced. 
This is also discussed in Bergafio (1916). 
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“,.. when they [diphthongs] come in the middle, the vowel [pairs] are not clearly pronounced; rather, 
they merge, for example pamanlacao: one does not say pamanlacao mo, but pamanlaco mo, neither 
does one say balay mo, but rather bale mo; hence, ay sounds more like e than a, like palay, pale mo, 
and ao sounds more like o, like galao, pamangalo mo.” (Bergafio 1916:3). Translated from original 
Spanish) 


The unreduced forms can still be seen in the sentence Manlacao cayo ‘you all go’ (1860), and Ala con 
petay a tauo ‘I did not kill anyone’ (1916). 

While Himes (2012) and Del Corro (1980) noted that diphthong reduction only occurs on *aj and 
*aw, evidence suggests that this simplification process can also occur in [ia] and [ua] clusters, similar 
to how *uj and *iw were both reduced to [1?]. Examples (16) and (17) illustrate this. 


(16) _ paburian ‘to disregard’ (Bergafio 1860:66) > paburen 
(17) | manwas ‘to wash’ (Del Corro 1988:17) > manas 


The form manwas is seen in some varieties of Kapampangan like Calaguiman-Mabatang. This 
particular variety has retained word-final diphthongs possibly due to the community’s relative isolation 
from other Kapampangan-speaking communities (Cruz, de la Rosa, Pelagio & Quizon 2020). 


3 Kapampangan Perfective Infix <i> 

In this section, we present an alternative to the past analyses of the seemingly complex and 
unpredictable formation of perfectives through replacives and vowel lengthening: these morphemes are 
products of phonological changes brought about by the infixation of another perfective affix <i>, which 
then underwent diphthong reduction as discussed in the previous section. 


3.1. Stems with [u] as V1 
When stems whose V1 is [u] (e.g., kutang ‘question’) is inflected with the perfective infix <i>, the 
diphthong formed is reduced to [1?] and further to [i] or [i:]. This process is illustrated in (18). 


(18) ~~ [iu] > [1] 
kutay — *kiutayn — *ki?tan — ki:tay ‘asked (a question)’ 
lukas — *liukas > *li?kas > li:kas ‘took off (clothes)’ 


Del Corro (1980:42) noted that utterance-medial [?], and in effect word-medial ones, are dropped and 
replaced instead with compensatory lengthening. (18) illustrates this point: the loss of the stop caused 
the lengthening of the preceding vowel [i]. This typically occurs in stems whose first syllable is open. 
For stems with a closed first syllable (i.e., CVC syllable structure), compensatory lengthening is 
suppressed. Examples are shown in (19). 


(19) [iu] = [i] 
buklat > *biuklat — *bi?klat > biklat ‘opened’ 
kumbiran — *kiumbiran — *ki?mbiran — kimbiran ‘invited’ 


3.2 Stems with [i] as V1 
For stems with [i] as V1, the perfective aspect is marked by the lengthening of [i] (Mirikitani 1971: 66; 
Del Corro 1990:51). This can also be explained by the infixation of <i> as is illustrated in (19). 
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(20) [ii] — [i:] 
miras — *miiras > mi-ras ‘arrived’ 
misabi — *miisabi > mi:sabi ‘talked’ 
minum — *miinum — mi:num ‘drank’ 
mipunta — *miipunta > mi:punta ‘went’ 


AS we can see in (20), the lengthening of [i] is due to the juxtaposition of two consecutive [i] resulting 
from the infixation of <i>. The sequence of two [i] is then expressed as vowel lengthening or 
compensatory lengthening. 

However, there are instances wherein the infixation of <i> does not produce length. This usually 
occurs in closed syllables similar to instances in (19). 


(21) = midlip > *miidlip — *mi:dlip — midlip ‘napped’ 


3.3 Stems with [a] as V1 

As previously illustrated in (13), the cluster [ia] produces the same change as the diphthong *aj in Himes 
(2012). The infixation of <i> triggers diphthong reduction in stems whose V1 is [a] and thus changes 
the V1 of the stem to [¢] in marking the perfective aspect. 


(22) [ia] — [e:] 
mate > *miate > me.te ‘died’ 
parulyan — *pia?ulyan > pe-Pulyan ‘sent home’ 
manabu — *mianabu — me:nabu ‘fell’ 
mangan — *miangan — me.:ngan ‘ate’ 


Reduction to [€:] usually occurs in stems with open first syllables. The vowel length in (22) is possibly 
caused by compensatory lengthening. However, in closed syllables, [ia] is reduced to [i], and 
compensatory lengthening is suppressed, as in (19) and (21). 


(23) fia] > [i] 
lakwan — *liakwan —> *le:kwan — likwan ‘left’ 
damdam — *diamdam — *de:mdam — dimdam ‘heard’ 
makmul > *miakmul — *me:kmul > mikmul ‘swallowed’ 
Pakit > *?iakit > *?e:kit > ikit ‘saw’ 


In stems with the prefix pag-, the resultant monophthongs are either [1] or [e€]. In [ai] — [e€], notice that 
compensatory lengthening is suppressed as well. 


(24) — [ia] > [1], [e] 
magobra (m-(p)ag-obra) — *miagobra > *me:gobra — megobra, migabra ‘worked’ 
magkwentu (m-(p)ag-kwentu) — *miagkwentu > *me:gkwentu — megkwentu ‘related’ 
pagsikapan — *piagsikapan — *pe:gsikapan — pegsikapan ‘worked hard on’ 
pagmulalan — *piagmulalan — *pe:gmulalan — *pegmulalan — pigmulalan ‘wondered 
about’ 
magaluk — *miagaluk — migaluk, me:galuk ‘offeedr’ 
maglipat — *miaglipat — miglipat, me:glipat ‘transferred’ 
magdala — *miagdala > migdala, me:gdala ‘brought’ 
Alternations between [1] and [e] can be observed in resulting perfective inflection from stems with /ia/. 
Forman (2019) noted this alternation between [1] and [€], due to the recent changes in the language, has 


caused the semi-contrastive status of some vowel pairs. It was also observed that [i] and [e] are less 
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clearly contrastive. In some words, such as ante ‘where’ and anti ‘like’, they are contrastive, but in 
words like piru and pero ‘but’, they are not. These recent changes are probably contact-induced. 

As Himes (2012) and Del Corro (1980) discussed, [e] is the product of the reduction or 
monophthongization of the diphthong *ay. This [¢] can be considered phonemic as evidenced by the 
minimal pairs provided by Forman: 


(25) fil, /e/ 
[de] > [da] ‘3PL.GEN’ + [ya] ‘3SG.ABS’ 
[di] ‘plural core marker’ 


The [e] in (25a) is a product of diphthong reduction that contrasts with [i] in di showing that /e/ can be 
considered a phoneme, albeit one that is formed only recently. /e/ however shares the same vowel space 
with the lowered variant of /i/, which occurs in the final syllable of a word, or in the final syllable of an 
utterance with high pitch, as in questions (Del Corro 1980:8). This overlap between /1/ and /e/ can be 
explained by (1) the relatively recent formation of /e/ as a phoneme and (2) Kapampangan 
distinguishing vowel phonemes with tongue position rather than tongue height (Del Corro 1980:8). 
Similarly, the overlap between these front vowels may also explain why [i] occurs instead of [¢] in 
stems with [a] as Vj that are infixed with <i>, as can be seen in (24). This might also explain the apparent 
aberration in (23), wherein the preference to one alternant, in this case [i], has been fossilized. 


4 Conclusion 
The analysis presented above has taken into consideration the general diachronic patterns and 
morphophonemic processes present in Kapampangan and other Philippine-type languages. Since the 
infixation of the perfective affix <in> is established in Kapampangan, it follows that its counterpart <i> 
behaves similarly, meaning that the morpheme also should be an infix as well. The allomorphs when 
<i> is affixed to stems is probably due to diphthong reduction, a phonological change in Kapampangan 
that previous studies only associate with diphthongs in word-final position. This alternative is believed 
to be more satisfying than the so-called vowel change and use of replacive vowels used in previous 
analyses. 

Based on these, we summarize the morphophonemic rules of infix <i> in (26). The infix is inserted 
between the first consonant and first vowel of the stem, with the resultant form varying depending on 
its V1 and syllable structure. 


(26) when V1is [a], it becomes [e:] but, 
when V1 is [a] and the syllable has a CVC structure, it becomes [i] 
when V1 is [u], it becomes [i:] but, 
when V1 is [u] and the syllable has a CVC structure, it becomes [i] 
when V1 is [i], it becomes [i:] but, 
when V1 is [i] and the syllable has a CVC structure, it does not change 
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Abstract 

Porohanon, spoken in the Municipality of Poro, Camotes, Cebu is a member of the Central 
Bisayan branch of the Bisayan complex (Zorc 1977). Previous descriptions of this speech 
variety (Wolff 1967, Zorc 1977, Ballo 2011) have tended to classify its common noun 
markers into NOMINATIVE, GENITIVE, and OBLIQUE case forms. These forms are also 
purported to encode distinctions of DEFINITE versus INDEFINITE and SPECIFIC versus 
NONSPECIFIC, notions which fall under what Balogh, Latrouite, and Van Valin (2020) call 
“nominal anchoring”. The current study re-evaluates the functions and present-day forms 
of these common noun markers using written and spoken data. An alternative classification 
is proposed in this paper’s conclusion. The syntactic alignment of Porohanon is also 
reassessed considering more contemporary research on ergativity in Philippine languages. 


Keywords: Porohanon, nominal anchoring, definiteness, specificity 
ISO 639-3 codes: prh 


Introduction 

Nominal anchoring (Balogh, Latrouite, and Van Valin 2020) is a vital component of any language. As 
a human system employed to refer to entities in the real world or to participants in a situation, various 
linguistic resources and structures are utilized to meet these needs. Notions traditionally associated with 
this such as definiteness and specificity have enjoyed wide coverage and exhaustive discussion in the 
philosophy of language and theoretical linguistics literature (cf. Balogh, Latrouite, and Van Valin 2020 
for a cursory survey). 

This paper examines the common noun markers of Porohanon spoken primarily in the Municipality 
of Poro, Camotes, Cebu, Philippines. The common noun markers are analyzed for their nominal 
anchoring functions in written and spoken data. The present study is intended to be another building 
block toward a more comprehensive grammatical description of Porohanon, which is relatively 
understudied and underdescribed compared to other varieties and Philippine languages of wider 
communication, such as Cebuano and Waray. 

Zore’s monumental The Bisayan Dialects of the Philippines: Subgrouping and Reconstruction 
(1977) proved to be an indispensable resource for the current study. One can even say that the work has 
already laid-out the main lines of inquiry and has documented the most significant aspects of the 
grammars of Bisayan varieties. Apart from lacunae in the Porohanon data acknowledged by the author 


List of Abbreviations: 

1 — first person, 2 — second person, 3 — third person, A — the most agentive core argument of a transitive verb; 
APPL - applicative; ABS — absolutive; CAUS — causative; COMPL — completive; CONJ — conjunction; DEF 
— definite; DIST — distal; E — extended argument; ENUM - enumerative; ERG - ergative ; EXCL — exclusive; 
GEN - genitive; HAPP — happenstance; HES — hesitation pause; INDF — indefinite; INF — infinitive; INTR — 
intransitive; IPFV — imperfective, IRR — irrealis; LNK — linker; LOC — locative; NONSPEC — nonspecific; O 
— the most patientive core argument of a transitive verb; OBL — oblique; PL — plural; PFV — perfective, POS 
— postposed form; PRE — preposed form; PROX — proximal; PST - past; Q — question word; REAL — realis; S 
— the single core argument of an intransitive verb; SG — singular; SPEC — specific; STEM — stem-forming 
affix; STAT — stative; TR — transitive; = — clitic boundary; - — morpheme boundary 
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up-front (Zorc 1977:269, 276), another niche that the current study intends to occupy is presenting 
findings from naturalistic speech data. While traditional sentence elicitation may have already 
uncovered the basic structures at work, it is my belief that transcriptions of continuous speech 
recordings could yield valuable complementary observations. Several language-specific resources were 
also consulted and are cited throughout the rest of the paper. 


Referential expressions in Philippine languages 
McFarland (1978:151) operationalizes reference based on his definition of what a referent is, namely, 
a “non-linguistic entit[y] which [is] talked about”. Reference, therefore, is “the linguistic process 
whereby referents are identified.” 

Referential expressions, or “reference expressions”, in Philippine languages tend to fall into four 
(4) types: (1) common referential expression, (2) personal referential expressions, (3) personal 
pronouns, and (4) deictics (McFarland 1978:141). In McFarland’s model of Philippine syntax, Table | 
shows the ways these types are distinguished. This paper only focuses on the first type, including the 
common referential expressions in Porohanon. 


Table 1: McFarland’s (1978 :154) classification of referential expressions 


Categories Definitions 
Common Referential Expression “Names of objects and places are marked, and treated 
syntactically, as common reference expressions.” 


Personal Referential Expression | “In Philippine languages, labels which are attached to persons 
and personified beings (personal names) are marked as personal 
referential expressions.” 


Personal Pronouns “Personal pronouns distinguish referents on the basis of the 
speaker-addressee relationship.” 


Deictics “Deictic pronouns distinguish referents on the basis of the 
spatial relationship (nearness or remoteness) to the speaker, and 
perhaps the addressee” 


On the structure of common referential expressions, McFarland (1978:141) states that “A CRE 
[Common Referential Expression] (something like a common noun) in its most general form consists 
of a CRE article and a predicate phrase”. 

Reid and Liao (2004:464), meanwhile, describe referential expressions as “strongly right- 
branching, with heads preceding modifiers”. This right-branching tendency dictates that “determiners” 
(as per Reid and Liao, 2004:464, roughly equivalent to “articles” in McFarland 1978 and Balogh, 


Latrouite, and Van Valin 2020) appear before the so-called “head noun”. * 


Porohanon common noun markers 

Table 2 lists the forms of the common noun markers in Porohanon. They are categorized according to 
the core cases ABSOLUTIVE (ABS), ERGATIVE (ERG), and GENITIVE (GEN) with the OBLIQUE (OBL) marker 
in the final column. The forms in boldface have already been identified by Wolff (1967:66) and Zorc 
(1977:85) and are, for the most part, corroborated by the data I have gathered. The rest of the alternant 
forms not in boldface are discussed in the following subsections. 


2 A terminological (and perhaps, conceptual) note: McFarland (1978) has elaborated on his choice of using 


“reference expressions” or “REs” rather than the more common (but also less theory-neutral) label “noun 
phrase” or “NP”. I tend to agree with McFarland’s choice and will be employing this label throughout the 
paper, even if more contemporary studies such as Reid and Liao (2004) still use the label “noun phrase/ NP”. 
Reid (2002) has argued elsewhere that these monosyllabic forms prevalent in Philippine languages commonly 
analyzed as “determiners” or “articles” are better taken to be the head nouns in these constructions, and that 
the following verb or noun serves as its complement. 
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Table 2: Porohanon common noun markers 


ABS ERG GEN OBL 
an [?an] / ang [?an] | san [san] / sa [sa] | san [san] / sa [sa] sa [sa] 
in [?in] / =y [j] sin [sin] sin [sin] 


Ergativity in Porohanon 

Firstly, a comment on syntactic alignment. In previous studies (Santiago 2018, 2019), I have labelled 
the case forms of Porohanon referential expressions as NOMINATIVE (NOM)-GENITIVE (GEN)-OBLIQUE 
(OBL) following the categories adopted by past studies. 

This topic has been reviewed by a number of studies in the past. Wolff (1967) employs a 
NOMINATIVE-GENITIVE-LOCATIVE classification for the “construction markers” of Porohanon. 
Meanwhile, Zorc (1977:69) writes, “The case system of Bs [Bisayan] nominals includes three 
categories: nominative, genitive, and oblique. [emphasis added]”. He adds that NOM forms occur 
mainly as “topics of a clause”, his phrasing for describing the most privileged syntactic argument. 
Meanwhile, there is considerably more nuance to the types of nominals that are cast as GEN and OBL 
according to their “form, meaning, distribution, and use” (Zore 1977:69). McFarland (1978:141), 
analyzing Bikol Legazpi and Tagalog, employs a similar case inventory, calling these “basic surface 
forms for each type of RE [reference expression]”. Ballo’s (2011) undergraduate capstone research 
project on Porohanon essentially adopts Wolff's (1967) inventory of “construction markers”. 

The classification in Reid and Liao (2004) is identical to that of Wolff (1967): “We choose to 
distinguish between case forms such as NOMINATIVE, GENITIVE, LOCATIVE, etc. marked either 
morphologically (i.e., by the actual form either of the nominal itself or one of its co-constituents), or 
syntactically (i.e., by word order), and case relations, namely PATIENT, AGENT, CORRESPONDENT, 
MEANS, and LOCUS.” (Reid & Liao 2004:434) Worth noting at this juncture is that these researchers do 
analyze Philippine languages as exhibiting an ERGATIVE syntactic alignment, a claim absent in the 
studies cited previously. However, they prefer to stick to using the term NOMINATIVE over the term 
ABSOLUTIVE to refer to the “least indispensable complement of a basic predication.” (Reid & Liao 
2004:435) They also describe NOMINATIVE as the “typologically more general term”. 

The present stage of my research on Porohanon has enabled me to put forward a more informed 
stance on the issue. The variables “S, A, O” * are used to categorize the various referential expressions 
and their relation to the predicate of a clause. Slightly modifying Dixon’s (1968, 1972) original 
formulation, the variables are defined as follows: 


e S-—the single core argument? of an intransitive verb 
e A-—the most agentive core argument of a transitive verb 
O — the most patientive core argument of a transitive verb 


Following this set of variables, one of the parameters identified for prototypical transitive constructions 
is a distinct source of action (A) apart from the most affected entity (O) (Nolasco 2009:9). An 
intransitive construction, on the other hand, involves the convergence of the “source of action/ most 
agentive core argument” and the “most affected entity/ most patientive core argument” on a single 
argument of the clause, thus labelled (S). Non-core arguments—those least “immediately-involved” © 
arguments in a clause which cannot be considered, S$, A, or O—are encoded as obliques or OBL. 
Typically, spatial, temporal expressions are encoded as obliques. 


4 This system was first introduced by Dixon (1968, 1972) as a heuristic for demonstrating the different types of 


syntactic alignment exhibited in the marking of arguments in Dyirbal in contrast to Indo-European languages. 
This modification of Dixon’s original formulation is done to acknowledge Mithun and Chafe’s (1999) 
reservations with the commensurability of the notion of ‘subject’ for typologically diverse languages. 

This characterization of “immediacy of involvement” in the action/event expressed by the predicate follows 
Mithun (1994:255) in her description of the ABSOLUTIVE as the “participant that is the most immediately or 
directly involved in an event or state”. 
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In example (1), S in Porohanon takes the ABS common noun marker an [?an]. A single core argument 
puza ‘child’ of an intransitive verb mobarog ‘will stand’ is then considered an “S” in our heuristic. In 
this situation, the child is both the source of the action of standing (mobarog) and the entity affected by 
it. 


(1) Mobarog an puza 
mo-barog an=puza 
IRR.IPFV.INTR -stand ABS=child 

N) 


‘The child will stand.’ (#272, UP Dept. of Linguistics 775-sentence list) 


Let us look at example (2), this time with a second referential expression. Since ara [?a.ca?] ‘EXIST’ 
predicates on the existence of only a single core argument ang daga ‘the young lady’, the second 
referential expression sa bay ‘in the house’ is not as immediately involved in the clause. It only serves 
to indicate the location of the young lady’s existence. It is thus encoded as an OBL. Therefore, we can 
consider ang ’ as the marker of S. 


(2) Ara sa bay ang daga. 
Ara sa=bay ang=daga 
EXIST | OBL=house ABS=young.lady 
S 


‘The young lady is in the house.’ (#5, UP Dept. of Linguistics 775-sentence list) 


Example (3) shows a very prototypical transitive construction wherein the source of action (A) is fully 
differentiated from the most affected entity (O). Moreover, the action expressed by the verb itself and 
its voice morphology indicates a very effortful and intentional act on the part of the entity encoded as 
A (Nolasco 2009) . 


(3) Giputol Sa taw an kahoy gamit an sundang. 
gi-putol sa=taw an=kahoy gamit an=sundang 
REAL.PFV.TR-chop ERG=person ABS=wood STAT.use ABS=axe 

A O E 


‘The person chopped the wood using an axe.’ 
(#347, UP Dept. of Linguistics 775-sentence list) 


Setting aside for now the extended argument an sundang ‘the axe’, one can observe that the main clause 
shows the person taw marked with sa and the wood kahoy marked with an. The shared marking of the 
S and the O with the common noun marker an points toward an ergative syntactic alignment on the 
formal level. This alignment on the formal level is what Reid and Liao (2004) seem to acknowledge, 
yet they do not go all the way and label their cases as ERG and ABS in favor of the “typologically- 
neutral” labels NOM and GEN. 


As one will notice in Table 2, the final nasal of the ABS common noun marker of Porohanon may be realized 
as either [n] or [ny]. Wolff (1967:66) and Zorc (1977:85) both record the form with the final alveolar/dental 
nasal (“qan” in their transcription). My language consultant also makes the noteworthy claim that an [?an] is 
the one that is “original Porohanon”’. (J. Andriano, personal communication, 22 April 2018) 

Instances of assimilation to the velar position such as the one documented in example (2) may well be just 
a synchronic phonological process. However, we cannot discount the status of this alternation as a 
sociolinguistic variable possibly pointing to Cebuano’s pervasive and continuous influence on Porohanon 
since other Central Bisayan languages such as Waray and Masbatenyo (Rosero 201 1:44) retain an to a higher 
degree. 
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Regarding the predicate giputol ‘chopped’, indeed, this action of chopping a piece of wood with an 
instrument such as an axe involves much effort and intention on the part of its source, a person. Also, 
the entity being chopped, a piece of wood, undergoes a transformation in its physical state, what Nolasco 
(2009:8) in his revision of Hopper and Thompson’s (1980) transitivity parameters categorizes as “total 
affectedness”. 

Nolasco (2009:22) adds that “most work subscribing to the ergative analysis has tended to focus 
on the formal aspects of the phenomenon, downplaying its semantic, pragmatic and discourse 
motivations... The meaning-based and formal evidence points to the robust manifestation of the 
ergative-absolutive relation in Philippine-type languages...” Thus, this excursus into transitivity in 
Porohanon has been essential in demonstrating its ERG-ABS syntactic alignment. 


On the distinction between the grammatical relations ERG and GEN 

Something needs to be said about distinguishing the grammatical relations ERG from GEN, even if their 
common noun marker forms are identical.* I posit that the source of the action in a transitive 
construction, the ERG, is a grammatical relation distinct from the possessor (Kroeger 2005:104), the 
GEN. The homophony of the forms for these two grammatical relations can be observed in example (4). 


(4) Gisuwat sa taw an pangan sa daga 
Gi-suwat sa=taw an=pangan sa=daga 
REAL.PFV.TR-write ERG=young.man ABS=name GEN=young.lady 
'The young man wrote the young lady’s name.’ 
(#354, UP Dept. of Linguistics 775-sentence list) 


The ERG-marked argument taw ‘young man’ is the source of the action gisuwat ‘wrote’ which affects 
the argument pangan ‘name’, marked ABS. The action of the young man gives rise to the name on the 
written page. Meanwhile, the one who possesses that name, the daga ‘young lady’ is in a different case, 
GEN. 


On the apparent homophony among the ERG/GEN and OBL common noun markers 

Not only can the markers for ERG/GEN fall together, but also the common noun marker for OBL, as seen 
in the referential expression sa bay ‘the house’ in example (2). The linguist who wishes to describe the 
common noun marker system of Porohanon must now contend with three distinct grammatical 
relations—the ERG, GEN, and OBL—converging on one form: sa [sa]. 

Wolff (1967), later cited by Zorc (1977), did not record this phenomenon in Porohanon. This 
homophony was probably not yet apparent in the late 60s in the speech of John Wolff’s language 
consultants (Wolff 1967:78). Zorc, however, makes the important observation that “the Ceb [Cebuano] 
oblique and definite genitive markers are homophonous (sa).” (1977:97) Ballo (2011:67) also notes that 
the marker san is already falling into disuse among speakers of Porohanon. 

Once again, like the case of the ABS common noun marker an [?an]/ ang [an] discussed in 
footnote 7, one can no longer hold that this is simply a process of apocope, or the loss of the final 
segment of a morpheme (Crowley & Bowern 2010). Could this be another indication, a sociolinguistic 
variable, that Porohanon is becoming more and more like Cebuano? 


8 McFarland (1978:140) rightly observes that “The various cases in Philippine languages are not so clearly or 


discretely marked as in many languages.” 

Notably, I was able to interview Atty. Lourdito Borlasa, one of Wolff’s original language consultants, when 
I had conducted fieldwork in Poro, Camotes, Cebu last 2018. He is of the opinion that Porohanon is already 
a dialect of Cebuano, just as Wolff argued in his 1967 paper. My other language consultants, on the other 
hand, stress the unintelligibility of Porohanon to Cebuano speakers. They can understand and can easily 
switch to Cebuano, but Cebuano speakers from the “mainland” have a difficult time understanding their 
tinaga-Poro ‘(lit.) from Poro’ speech. (E. Marquez, personal communication, 12 December 2018). The 
following excerpt from Lobel (2006:915) suggests an extended history of contact between Porohanon and 
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Definiteness and specificity in Philippine languages 
Among the notions of nominal anchoring, definiteness seems to be the one that has received the most 
coverage in key works of Philippine morphosyntax. Despite the frequent reference to this notion, 
different scholars seem to have different versions of it and acknowledge different manifestations of it. 
Constantino, for example, considers the “simple, predicative, definite sentence” (1965:108, 
1971:1) to be the kernel sentence from which other construction types are derived through 
transformations. Constantino (1971:2) later elaborated on this notion, stating that “they [simple, 
predicative, definite sentences] are definite in that both their subject and predicate are ‘marked’, that is, 
each is preceded or followed by an article or affix. The “Definiteness Hypothesis” of Constantino, then, 
seems to be hinged on the mere presence or absence of an article or affix in the subject and predicate of 
a sentence. This is so that Tagalog sentence pairs such as (5) and (6) are treated as definite and indefinite, 
respectively, based on the presence of ang in the first constituent (English glosses are Constantino’s 
own): 


(5) Ang bata ang kumain sa mangga. 
Ang=bata ang=k<um>ain sa=mangga 
ABS=child ABS=<INTR.PFV>eat OBL=mango 


‘It was the child who ate the mango.’ (Constantino 1971:66) 


(6) Bata ang kumain sa mangga. 
Bata ang=k<um>ain sa=mangga 
Child ABS=<INTR.PFV>eat = OBL=mango 


‘It was a child who ate the mango.’ (Constantino 1971:66) 


Cubar ([1975] 2019) wrote an extended critique of this analysis, calling him out, among other 
things, on his decision to label the second ang in sentences such as those above a “predicate marker”. 


“It is obvious that we have here a different notion of what a noun phrase is. For Constantino, ang 
tumakbo ‘the one who ran’, ang kinain ng bata ‘that which was eaten by the child’, ang maganda ‘the 
one that is pretty’, and ang nasa kahon ‘that which is in the box’ are not noun phrases because their 
heads or centers are not nouns. He would call the first two phrases verb phrases, the third adjective 
phrase, and the last particulate phrase. However, these phrases have unmistakable nominal readings. 
They are what linguistic philosophers call definite reference expressions — expressions which are used 
for naming objects.” (Cubar [1975] 2019:73) 


Cubar still anchors the notion on the presence of article/affix-marking in the sentence constituents. 
However, he offers a more detailed discussion of referential expressions based on “degrees of 
definiteness”. A definite common noun, according to him, “derives its definiteness either from the 
presence of its referent in the common immediate environment of the speaker and the hearer, or from 
linguistic anaphora, including the use of definitizing attached relative clause.” (Cubar [1975] 2019:84) 
Addressing another point of contention in Constantino’s analysis, Cubar writes that “A speaker uses the 
definite form of a noun phrase when he assumes that the existence of a referent has been registered in 
the consciousness of the hearer, or when he believes that the referent has been sufficiently described 
such that it has a determined identification for the hearer.” (Cubar [1975] 2019:91) 

McFarland problematized the same issue three years later in his article Definite objects and subject 
selection in Philippine languages (1978) beginning with his conception of reference already cited 
earlier in this paper. “Definite reference”, according to McFarland (1978:153), “indicates that the 
referent is specific and known to the speaker and known to the addressee.” “Indefinite reference”, on 
the other hand, “indicates that the referent is non-specific or unknown to the speaker or unknown to the 
addressee.” Reid and Liao (2004:469) write that “In all Philippine languages, Nominative phrases 


Cebuano: “It is also interesting that the dialects of the oldest settlements in Baybay, Leyte, (C. Rubino, 
personal communication), and the Camotes Islands (Wolff, 1967) show a Warayan substratum, indicating 
that Waray-Waray was much more widespread in previous centuries before the expansion of the Cebuanos 
in the mid-1800s (Larkin, 1982).” 
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typically have a definite interpretation, that is, the speaker assumes that the addressee knows the general 
reference of the actant which is the head of the phrase.” An “indefinite actant’”, on the other hand, “is 
typically expressed by a phrase carrying the Correspondent case relation in an intransitive clause and is 
marked with a Locative, Genitive, or Oblique Determiner...” 

Let us now turn to the notion of specificity. McFarland (1978:151) had already discussed 
specificity in relation to his notion of definiteness. According to him, “A referent may be specific, non- 
specific, or generic. In the first case, the speaker is saying something about a particular, i.e., specific 
individual.” A speaker refers to something generic when s/he refers to “a whole class of entities” or 
uses a form “intended to apply to all members of that class.” Therefore, “[i]f the referent is specific, it 
has an identity. That is, it has an existence which is distinct from all other referents, even those which 
may be very similar to it.” (McFarland 1978:152) 

McFarland’s crucial contribution lies in his disambiguation of specificity from definiteness: “The 
identity of a specific referent may or may not be known to the speaker. When I say that a speaker knows 
the identity of a referent, I mean that (1) he is in possession of a file of prior knowledge about that 
referent and that (2) he is aware that the referent currently being discussed is the same as the one to 
whom this prior knowledge relates.” (McFarland 1978:152) 

McFarland’s keenness to the negotiatory nature of discourse is evident in the following statement: 


“If the identity of the referent is known to the speaker, he must make a judgment as to whether the 
addressee also knowns [sic] the identity. If he judges that the addressee already possesses a file of 
information about the referent, the speaker must choose a linguistic expression which will enable the 
addressee to locate that file and to add the new information to what is already there. If he judges that 
the addressee does not have such a file, he may choose a linguistic expression which indicates to the 
addressee that he is not expected to have a file on the particular referent, and that he may or may not 
choose to open such a file. Such information is carried by ‘indefinite noun phrases’ and existential 
sentences.” (McFarland 1978:152) 


Reid and Liao (2004) maintain the distinction between definiteness and specificity by writing that: 


“Knowing the general reference of an actant does not imply that the addressee knows the specific actant 
being referred to. Although Nominative phrases are typically definite, they may or may not be specific. 
The degree of specificity often depends on the presence of a demonstrative, either as the head of the 
noun phrase, or as a post-head modifier, or on the presence of some other post-head modifier such as a 
genitively marked noun phrase, or a relative clause. A number of languages mark a distinction between 
specific and non-specific phrases, with the specific phrase being invariably marked by a form which is 
either a demonstrative, or can be shown to have been a demonstrative at some earlier stage of the 
language.” (Reid and Liao 2004:471) 


Table 3 summarizes the key points pertaining to definiteness and specificity across the cited literature. 
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Table 3: Summary of notions of definiteness and specificity in Constantino (1965, 1971), Cubar 
([1975] 2019), McFarland (1978), Reid and Liao (2004) 


Category A B C D E 
Notion Constantino | Constantino | Cubar McFarland Reid and Liao 
1965 1971 [1975] 2019 1978 2004 
Definiteness | - FORMAL FORMAL CRITERIA: FORMAL FORMAL 
CRITERION: Presence of CRITERION: CRITERION: 
Presence of constituent marking Reference Case-marking 
subject/ thru article or affix, expression (Nominative, 
“predicate” Linguistic anaphora, marking typically for 
marking thru Definitizing attached definite, 
article or affix | relative clause SEMANTICO- Locative, 
PRAGMATIC Genitive, or 
SEMANTICO- CRITERION: Oblique for 
PRAGMATIC CRITERIA: | Specificity and | indefinite) 
Presence of referent in | knownness to 
the common the hearer and SEMANTICO- 
immediate addressee PRAGMATIC 
environment of the CRITERION: 
speaker and hearer, Assumption of 
Registration in the the speaker that 
consciousness of the the addressee 
hearer, Determined knows the 
identification for the general 
hearer reference of the 
actant 
Specificity - - - FORMAL FORMAL 
CRITERION: CRITERION: 
Reference Presence of a 
expression demonstrative, 
marking Presence of 
some other 
SEMANTICO- post-head 
PRAGMATIC modifier such 
CRITERIA: as a genitively 
Particularization | marked noun 
of an individual, | phrase or 
Existence which | relative clause 
is distinct from 
all other 
referents 


Definiteness and specificity in Porohanon 


We return to the discussion of the common noun marker forms of Porohanon. Wolff (1967) 
distinguishes the first row of markers as “definite” from the second row which are “indefinite”. Much 
like Constantino, however, he takes as a given the reader’s notion of definiteness and does not elaborate 
on it anywhere else in the paper: 


Table 4: Porohanon common noun markers (revised according to the Wolff (1967) classification) 


ABS ERG GEN OBL 
Definite an [?an]/ ang [?an] | san [san]/sa [sa] | san [san]/ sa [sa] sa [sa] 
Indefinite in [?in] /=y [j] sin [sin] sin [sin] 


Zorc (1977:84) stipulates that for the entire Bisayan complex, “All dialects that have two genitive 
markers can make a distinction between definite and indefinite.” He recognizes the “differences in 
formation” of these common noun markers, stating that the vowel [a] almost always occurs in “general, 
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definite, or past” markers and the vowel [i] almost always occurs in “indefinite or nonpast” markers. 
(Zorc 1977:86) 

How does definiteness, purported to be the feature that distinguishes the two forms in the ABS, 
operate in naturalistic speech? The following excerpt is from a Pear Film narrative (Chafe (Ed.) 1980)'° 
recorded with Mr. Joseph Andriano, 33 years old (at the time of recording), from Brgy. Teguis, Poro, 
Camotes, Cebu. After watching the video and thinking of what to say for a few moments, he begins by 
introducing his narrative, as in (6). 


(6) Ara nakoy istorya nimo, Sir Vincci 
ara=na=ko=y=istorya (ka)nimo Sir Vincci 
EXIST=COMPL=1SG.GEN.POS=ABS.INDF=story 2SG.GEN.PRE Sir Vincci 


‘I already have a story for you, Sir Vincci.’ . Joseph Andriano — Pear Story, 0:00) 


Wolff (1967:66) observed that “The indefinite Cebuano marker y refers to an indefinite subject only in 
certain set of expressions.” Tanangkingsing (2011:146) later wrote that “...the referent is indefinite and 
is marked by the neutral marker =y that phonologically attaches to the preceding unit” 

Wolff states that Porohanon makes a distinction that Cebuano does not make, in that it still has two 
unreduced forms for the “subject marker” an versus in. This is reflected in Table 4. It seems, though, 
that contemporary speakers of Porohanon like Sir Joseph tend to reduce this indefinite ABS marker, just 
like in Cebuano. 

The phonological change is not yet absolute, however, since there are still instances where =y 
retains its full form in [?in]. See sentence (7) from the article PESO ‘'MANG REHESTRO from the 
Porohanon Newsletter (September-October, 2012:1). 


(7) TInin pagpang rehistro nga gibuhat sa PESO 
inin pag-pang-rehistro=nga=gi-buhat sa=PESO 
ABS.DEM.PROX INF-DISTR-register=LNK=REAL.PFV.TR-do OBL=PESO 
wa int in bazad. 
wa ini in=bazad 
NEG LOC.PROXABS.INDF-payment 


‘This registration to PESO, this has no payment.’ 


Now, let us return to Sir Joseph’s Pear Story narrative to see how the supposed definite counterpart of 
the ABS common noun marker functions. Our young male protagonist has gone off after his fall from 
the bicycle he was riding. However, Sir Joseph said that the young boy had forgotten his hat: 


(8) Nan sara, nalimtan na niza, uh, 
nan sara n(k)a-lim(o)t-an=na=niza uh 
CONJ now REAL.PFV.(STEM).HAPP-forget-APPL=COMPL=3SG.GEN.POS HES 
an iza kalo. 


an=1za=kalo 
ABS.DEF=3SG.GEN.PRE=hat 
‘And now, he had already forgotten his, uh, his hat.’ (Joseph Andriano — Pear Story, 02:13.01) 


In this example, the common noun marker an is selected for iza kalo ‘his hat’. Designating an as definite 
within the larger context of discourse would be peculiar because this is the first time this referent is 


10 The video is available on Youtube: https://www.youtube.com/watch?v=bRNSTxTpG7U&t=154s. 
4! Public Employment and Service Office 
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introduced. Without the aid of the video, it is only through sentence (7) that our young male protagonist 
is specified as actually wearing a hat. If this is the very first time this referent iza kalo ‘his hat’ is 
“pushed into the scene” in this narrative, why is it being marked with a supposedly “definite” common 
noun marker when there is no prior knowledge of this referent on the addressee’s part? 

Specificity might be the more appropriate notion to associate with this form, especially since Reid 
and Liao (2004) consider the presence of genitive marking an indicator that this referent is specific. 
Indeed, with the third-person, genitive, preposed pronoun iza, the speaker’s proposition is that the kalo 
‘hat’ is only the young male protagonist’s and no one else’s. 

Let us try to examine the ERG and GEN common noun markers this time. In this part of the story, 
the three children met by the young male protagonist have returned the hat which he had dropped. In 
return, the boy who owns the hat gave them each a pear. So the children part ways and the gang of three 
who helped the boy on the bicycle go along the path toward the farmer picking the pears from a tree, as 
in (9). 


(9) An tulo ka puza, padung didto Sa nangipo sin peras. 
an=tulo ka-puza pa-dung didto sa=n(p)ang-ipo sin=peras 
ABS=three-ENUM-child CAUS-go LOC.DIST OBL=REAL.PFV.DISTR-pick GEN.INDF?=pear 


‘The three children went back to the one who was picking pears.’ 
(Joseph Andriano — Pear Story, 02:35.01) 


Analyzing the common noun marker sin as indefinite in the referential expression sin peras does not 
seem to work because the very beginning of the Pear Film is a shot of a middle-aged man up in a ladder 
on a tree picking pears. More importantly, it has already been established early on in Sir Joseph’s 
narrative that this man in the video is, indeed, a man who picks pears. To consider sin as indefinite 
would be ignoring this knowledge of the referent by the speaker and addressee. 

Analyzing it as nonspecific, however, would yield a more accurate reading. sin could be classified 
as nonspecific since it was shown that the farmer had already picked multiple baskets of pears. What 
Sir Joseph’s proposition is here is that the farmer is a person who simply picks pears, not one, 
particular, individuated pear, but rather just entities that would be considered pears. 

Can we, therefore, consider san/ sa the specific counterpart? Consider sentence (10). Sentence (10) 
comes from the shared experiences of Mr. Abel Garciano, a municipal official at the Local Government 
Unit of Poro, when Super Typhoon Yolanda hit Eastern Visayas in 2013. In sentence (9), we see the 
ERG marker sa marking the common noun bagzo ‘storm’. Bagzo is then being modified by a relative 
clause nga Yolanda ‘which is Yolanda’. Following McFarland (1978) and Reid and Liao’s (2004) 
criteria, the particular, individual referent of bagzo is being specified not only through the use of the 
marker sa, but also through the relative clause nga Yolanda. 


(10) — Diin ang Cebu giagian sa bagzo nga Yolanda. 
diin ang=Cebu gi-agi-an sa=bagzo=nga= Yolanda 
LOC.Q.PST ABS=Cebu TR.REAL.PFV-pass-APPL ERG.SPEC=storm=LNK=Y olanda 


“Where the storm which is Yolanda passed by Cebu.’ 
(Abel Garciano — Unforgettable experience, 14:24) 


Conclusion 

Evidence from recordings of continuous discourse has shown that in terms of nominal anchoring, the 
common noun markers of Porohanon seem to encode a SPECIFIC versus NONSPECIFIC reading of the 
referent, rather than a DEFINITE versus INDEFINITE distinction as earlier assumed by Wolff (1967) and 
Zorc (1977). Thus, I propose the common noun marking system for Porohanon in Table 5. 
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Table 5: Revised Porohanon common noun markers 


ABS ERG GEN OBL 
Specific an [?an]/ ang [?an] | san [san]/ sa [sa] | san [san]/ sa [sa] sa [sa] 
Nonspecific in [?in] /=y [j] sin [smn] sin [smn] 


Does this mean that definiteness as a nominal anchoring strategy is totally absent from the system of 
Porohanon? The possibility is that it may be encoded in other forms of referential expression marking 
such as in deictics, pronouns, and suffixes. 
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Abstract 

In this paper, we study a new type of pronominal item emerging on the Internet in 
Vietnamese and Chinese. First, we demonstrate that pronominal items of this new type, 
which we dub “noncanonical”, are a separate category from both textbook default 
pronouns and imposters (Collins and Postal 2012). Then, we illustrate their real-life usage 
in detail. Our investigation shows that noncanonical pronouns in the two Asian languages 
are similar not only in syntactic behavior but also in lexical sources, based on which we 
propose three subtypes for them. Finally, we account for the half-grammatical-half-lexical 
status of noncanonical pronouns in the theory of generalized root syntax (Song 2019), a 
recent version of syntactic root theory. We also suggest a link between the propensity for 
noncanonical grammatical elements and high analyticity. 


Keywords: Vietnamese, Chinese, pronoun, Internet, root syntax, analyticity 
ISO 639-3 codes: vie, zho 


1. Introduction 


1.1 Pronominal items in previous research 

Previous research has documented two types of personal pronominal items in human language. The 
first type is the default personal pronoun (henceforth default pronoun)'—namely, the kind of pronoun 
typically seen in language textbooks and reference grammars, as exemplified in (1). 


(1) a. English: I, you, he, she, it... 
b. German: ich, du, er, sie, es... 
c. Vietnamese: tdi, ban (lit. ‘friend’), anh ta, cé ta... 
d. Mandarin: wo, ni, ta (‘he/she/it’)... 


Such pronouns are not necessarily the most often used, especially in languages like Vietnamese, but 
they are deemed textbook standards and are also the most widely studied type of pronominal item in 
linguistics. Textbook pronouns can be viewed as exponents of formal features, especially person, 
number, and gender (i.e., phi) features and occasionally also honorific features, as exemplified in (2). 


(2) a. Tama student. (I= [1SG]) 
b. Médchten Sie etwas zu trinken? [German] 
would.like.3PL you-HON something to drink 
“Would you like something to drink?” (Sie = [2SG/PL, HON]) 


Since we are only concerned with personal pronouns in this study, for simplicity’s sake we use “pronoun” to 
mean “personal pronoun” throughout. By “default” we mean to contrast the kind of pronominal item in (1) 
with the imposters and especially the noncanonical pronouns to be introduced below. 
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In (2a), the English pronoun / is the exponent of the featural specification [1SG]. In (2b), the German 
pronoun Sie is the exponent of [2SG/PL, HON]. A hallmark of textbook pronouns is that their grammatical 
behavior can be explained solely by their formal features. 

The second type of pronominal item documented in previous research is the “imposter” (Collins 
and Postal 2012). An imposter looks like an ordinary referring expression (R-expression) and is subject 
to third-person agreement in languages where syntactic agreement is required, but it semantically refers 
to the speaker or the addressee instead of a real third person. See (3) for an illustration. 


(3) a. Daddy (= I) is going to get you an ice-cream cone. 
b. Is the general (= you) going to dine in his suite. (Collins & Postal 2012:1—3) 


Imposters carry more pragmatic content than textbook pronouns. Thus, daddy in (3a) sounds more 
affectionate than J, and the general in (3b) sounds more formal than you. Descriptively we can say that 
imposters are R-expressions employed to refer to the speaker/addressee. There are many more imposters 
in English, such as yours truly (=I), this reviewer (= 1), Madame (= you), sweetie (= you), etc. Imposters 
are also widely attested in other languages (see the articles in Collins 2014). In fact, they are used so 
frequently and naturally in some languages, including Vietnamese, that they may be viewed as the de 
facto default pronominal items there, taking over the role of textbook standard pronouns. Imposters 
have not taken over the role of textbook pronouns in Chinese but are also highly common. See (4) to 
(6) for a comparative illustration of Vietnamese and Chinese’ imposters, which fall in three subtypes: 
kin terms, career titles, and personal names. 


(4) a. Hém-nay me sé nghi lam. [Viet.] 
today mom.1SG FUT stop work 
“Mom (= I) is having a day off today.” (a mom talking to her child) 
b. Jintian mama ZUO yu-tang. [Mandarin] 
today mom.1SG make §fish-soup 


“Today mom (= I) will cook fish soup.” (a mom talking to her child) 


(5) a. Thay vée-huu roi. [Viet.] 
teacher.1SG retire PRF 
“Teacher (= I) has retired.” (a teacher talking to their student) 
b. Ldosht yé bu zhidao da’an. [Mandarin] 


teacher.1SG also not know answer 
“Teacher (= I) don’t know the answer either.” (a teacher talking to their student) 


(6) a. Linh hiéu chua? [Viet.] 
Linh.2SG understand IMPERF 
“Has Linh (= you) understood yet?” (the addressee’s name is Linh) 
b. Lingling bi chi_— qiaokeli ma? [Mandarin] 
Lingling.28sG not eat chocolate Q 
“Lingling (= you) don’t eat chocolate?” (the addressee’s name is Lingling) 


Although previous studies on Vietnamese rarely (if ever) use the term “imposter”, that the boldfaced 
items in (4a) to (6a) qualify as imposters is clear: they are R-expressions employed to refer to the 
speaker/addressee. 


2 Unless otherwise specified, all our Chinese data are from Standard Mandarin. We also use the term Common 


Mandarin (i.e., the Mandarin variety commonly spoken in daily life) when discussing Internet language 
phenomena that have not yet been officially recognized as part of Standard Mandarin. 
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Overall, (4) to (6) reveal that imposters are used a lot more freely in Vietnamese? and Chinese than 
in English and that as such they are not a crosslinguistically homogeneous phenomenon.’ A most 
noticeable point of variation concerns agreement patterns. Since Vietnamese and Chinese both lack 
formal agreement, their imposters do not manifest the kind of mismatch between grammatical and 
notional person discussed at length in Collins and Postal (2012). Note that while the English translations 
all manifest third-person agreement (e.g., is, has), there is no syntactic agreement in the Vietnamese 
sentences. The same is true for the Chinese examples in (4b) to (6b). For this reason, Wang (2014) calls 
Chinese-style imposters “pseudo-imposters”. For expository convenience we will keep using the 
umbrella term “imposter” when there is no risk of ambiguity (as Wang himself does). 

Quite often, the referents of imposters are contextually determined. In (4a) and (Sa) the kin terms 
meaning “mom” are glossed as 1SG because they are uttered by mothers to their children, but the same 
sentences can be uttered by children to their mothers, and in that case the same kin terms will be glossed 
as 2SG. There may be crosslinguistic variation as to how much imposter interpretation depends on 
context, but it is clear that imposters can have context-dependent person indexing (or “floating 
reference”, see Alves 2007 for more on Chinese influence on Vietnamese pronouns), unlike textbook 
pronouns, whose person indices are lexically fixed (e.g., J can never mean “‘you”). 


1.2 A new type of pronominal item 
Textbook pronouns and imposters, however, are not the only types of “pronouns” out there. Observe 
the sentences in (7). 


(7) a Mi thé lami khéng soi cdi bung. _— [Viet.] 
Mi.ISG swear COP Mi.1SG NEG zoom.into CLF belly 
“Mi (= I) swear Mi (= J) didn’t zoom into (your) belly.” 
b. Mi khong — hiéu. Cac chi hiéu héng? 
Mi.1SG NEG understand PL _ sister understand NEG 
“Mi (= I) don’t understand. Do sisters (= you) understand?” 


In (7) the Vietnamese word M7 is used in place of a first-person pronoun. Mi is originally the name of 
a character in an old literary work (Vo chong A Phu ‘Couple A Phu’), who suffered a lot of injustice in 
the old days but stood up for herself and fought for her own happiness. Perhaps inspired by her story, 
contemporary netizens (mainly females, but occasionally also males) sometimes use her name as a term 
of self-address with a joking tone. In (7a), for instance, the netizen says “Mi swear Mi don’t feel upset” 
instead of “I swear I don’t feel upset” when commenting on her growing belly size; and similarly, in 
(7b), the netizen says “Mi don’t understand” instead of “I don’t understand” to deliberately sound naive. 

There are a number of reasons why items like Mj differ from imposters. First, although Mj is 
originally a personal name, its pronominal usage in (7) is clearly different from that of the personal 
name Linh in (6a). Specifically, Linh can refer to either the speaker or the addressee whereas Mi can 
only refer to the speaker. Second, when Linh is used pronominally, there is actually someone named 
Linh in the discourse, whereas Mi could refer to any speaker, similarly to “I’ in English. Mi thus behaves 
more like a textbook pronoun than an imposter, except that it carries idiosyncratic extragrammatical 
and pragmatic effects and is mainly restricted to online usage. 


3 As Paul Sidwell pointed out, it is important to note that much of the standard Vietnamese pronoun system 


was historically replaced by imposters. 


4 Between Vietnamese and Chinese, imposters are used even more freely in Vietnamese. 
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This phenomenon is not exclusive to Vietnamese but also observed in Chinese. See (8) for instance. 


(8) a Jit bu géi aijia xinli jianshé [Mandarin] 
just not give mourner.1SG psychological construction 
shijian. Zhe jiu qin-shang le.  Méi-rén  gdaosu d@ijia 
time now just kiss-up CRS no-person tell mourner.1SG 
haiyou qin di’ér-hui a. 
still kiss second-round EXCL 


“They just don’t give mourner (= me) any time to be psychologically ready. They just 
begin to kiss right away. No one told mourner (= me) that they were going to kiss again.” 


(Sina Weibo”) 

b. = Guoran shuxué de shijié shi méi-you sécdi_ de. 
as.expected math POSS world is not-have color NMLZ 
Aijia fa le. 
mourner.1SG tired CRS 


“The world of math is colorless just as expected. Mourner (= J) is tired.” 

Here the item of interest is Gijid, an ancient term of self-address that literally means “mourner” and was 
originally used by empress dowagers (i.e., emperors’ mothers). Contemporary netizens (mostly young 
females, occasionally also males) often use it in a jocularly arrogant tone. Thus, the two speakers in (8) 
respectively complain about an unexpected kissing scene on TV and the difficulty level of math, both 
sounding assertive and much more fun than if the default 1SG pronoun wo is used.° Note that there is 
an ongoing debate as to whether empress dowagers in Chinese history had really called themselves aijia 
or this was just a coinage of ancient playwrights, who then passed it on to modern scriptwriters (see 
Chen 2009). In spite of this, however, there is no doubt that the online term dijia is borrowed from 
historical contexts and that it synchronically behaves like a 1SG pronoun with special pragmatic effects, 
similarly to Vietnamese Mi. 

In sum, pronominal items like Vietnamese Mi and Chinese aijia, which represent a fashionable 
linguistic phenomenon in the Internet era, constitute a unique category (see more examples in §2). 
Unlike imposters, they do not have flexible, context-dependent person indexing or common R- 
expression usage. Unlike textbook pronouns, they are not exponents of formal features but carry 
idiosyncratic extragrammatical effects. We dub them “noncanonical pronominal items” as a working 
term. Furthermore, noncanonical pronominal items also differ from imposters in terms of their history, 
typology, and lexical materials. We summarize these distinctions in Table 1. 


Table 1: Differences between imposters and noncanonical pronominal items 


Imposters Noncanonical pronominal items 
Usage wide in real life limited to certain registers 
History long emerging (mainly online) 
Typology prevalent in many languages available in far fewer languages 
Reference flexible (contextual) fixed (lexical) 
R-expression usage yes no 
Lexical material nouns in contemporary use miscellaneous 


Sina Weibo is the Chinese counterpart of Twitter. 
Also notice the speaker’s choice of verb in (7b) in the vicinity of aijia. Here, for “tired”, the ancient-sounding 
fa is used instead of the synchronically more common /éi. This sort of stylistic or register-based agreement is 
commonly observed in Chinese (see Feng 2010 et seq.). 
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As Table 1 shows, imposters have been in use in both Vietnamese and Chinese for a long time, whereas 
noncanonical pronominal items have only recently emerged in the Internet era. Imposters are attested 
in many languages (see Collins and Postal 2012 and Collins 2014), whereas noncanonical pronominal 
items to our knowledge are less prevalent. Finally, imposters always have contemporary nominal 
counterparts, which allows them to be used as ordinary R-expressions, whereas noncanonical 
pronominal items have miscellaneous lexical sources (e.g., Mi recycles a fictional character name, and 
aijia recycles an ancient term of self-address). These distinctions set noncanonical pronominal items 
apart from imposters. 

Despite their particular characteristics, however, noncanonical pronominal items have not been 
well documented. The aim of our article is thus to present a preliminary investigation of these items in 
contemporary usage and situate them in modern syntactic theory. To do so, we will first present 
noncanonical pronominal items in Vietnamese and Chinese in more detail (§2), then refine the 
crosslinguistic pronominal item taxonomy (§3), and eventually incorporate noncanonical pronominal 
items in the syntactic theory of pronouns (§4). Finally, we conclude with a few points for future research 


(85). 


2. Subtypes of noncanonical pronominal items 
Noncanonical pronominal items in Vietnamese and Chinese fall in three subtypes based on their lexical 
sources: revived ancient terms (§2.1), dialectal terms (§2.2), and creative online coinages (§2.3). 


2.1 Subtype I: Revived ancient terms 

The first subtype of noncanonical pronominal item we have identified in both Vietnamese and Chinese 
is that of revived ancient terms. Such terms can be either literary, like Vietnamese Mi ‘female character 
name.1SG’, or (quasi) royal, like Chinese aijia ‘mourner.1SG’. Table 2 contains more examples 
belonging to this subtype. We provide literal glosses in single quotes and original/historical usage 


restrictions in parentheses. 


Table 2: Subtype-I noncanonical pronominal items in Vietnamese and Chinese 


Mi ‘female character name.1SG’ 
Vietnamese tram ‘emperor.1SG’ (by emperors) 
di-phi ‘beloved-concubine.2SG’ (by emperors or princes to their concubines) 


aijia ‘mourner.1SG’ (by empress dowagers) 
zhén *1SG’ (by emperors) 


Chi guda-rén ‘lacking-person.1SG’ (by pre-Qin state rulers) 
inese 
bén-gong ‘this-palace.1SG’ (by emperors’ sons and wives/concubines to inferiors) 


chéngqié ‘slave.1SG’ (by emperors’ wives/concubines to superiors) 
qing ‘2SG’ (by emperors to royal officials or between husbands and wives) 


Vietnamese trdm and di-phi have respectively been borrowed from Chinese zhén and di-féi,’ and have 
become increasing popular via hugely successful TV series such as My Fair Princess. Zhén had 
originally been a default 1SG pronoun in Old Chinese (9a) but got reserved for emperors in Qin dynasty 


7 However, these cognates have clearly developed different uses in the two languages. First, Chinese zhén has 


stayed a purely pronominal item even after being reserved for emperors, whereas Vietnamese tram seems to 
be in the process of further lexicalization (hence our different glosses for them). An informal survey reveals 
that Vietnamese speakers tend to think #rdm means “emperor”, even though the term has no common 
R-expression usage (otherwise it would be an imposter like thdy ‘teacher’). Second, while Vietnamese di-phi 
is a noncanonical pronominal item, Chinese di-féi is an imposter (hence its absence from Table 2), for it has 
common R-expression usage, as in shénmi wdngye de ai-fei ‘the beloved concubine of the mysterious prince’ 
(novel title). 


211 


Papers from SEALS 30 — Song and Nguyen 


(221 B.C.E.) (9b). In the pre-Qin era, vassal state rulers humbly referred to themselves as gud-rén (9c), 


which literally means “a person who lacks virtue”. 


(9) 


»8 


Zhén  huangkdo ——yué boyong. [Old Chinese]? 
1SG ancestor is.called Boyong 

“My ancestor’s name is Boying.” (The Lament, 3" century B.C.E.) 

Tian-zi zi-chéng yué zhen. 


heaven-son self-refer is.called 1SG 

“The Heaven’s Son calls himself zhén.” (Records of the Grand Historian, 1“ century 
B.C.E.) 

Gua-rén SUL Si, yi wu hui yan. 
lacking-person.1SG even.if die also not.have regret = in.it 

“Even if lacking-person (= I) dies, I will have no regret.” (Commentary of Zuo, late 4" 
century B.C.E.) 


As we mentioned earlier, although there are debates over whether some of the ancient terms that are 
being revived online had really been used in history,’ the historical origin/usage of a revived term is 
orthogonal to its synchronic categorial identification. It thus suffices to identify a term as a Subtype-I 
noncanonical pronominal item based on just two criteria: (7) the term has been borrowed from historical 
contexts (either real-life or fictional), and (ii) it synchronically qualifies as a noncanonical pronominal 
item. It is based on these criteria that we have identified the items in Table 2. See (10) and (11) for some 
real-life examples. Unless otherwise specified, all our Vietnamese examples are taken from Facebook 
or Twitter, and our Chinese examples, from Sina Weibo. 


(10) 


8 


Hoém-nay — cé ai muon rit tram di uong [ Viet. ] 
today have who want invite emperor.1SG go drink 

ca-phé khéng? 

coffee NEG 

“Does anyone want to invite emperor (= me) out for a coffee today?” 

Tram tha ti haha. 


emperor.1SG forgive wrongdoing haha 
“Emperor (= I) forgives (you) haha.” 


It was common practice for ancient Chinese rulers to use humble terms of self-address, so in this regard zhén 
is an exception, since it sounds authoritative and ruler-like even in archaic Chinese contexts (see, e.g., Oracle 
Bone Script Dictionary by Zhongshu Xu). The first emperor of Qin was responsible for the official royalization 
of the term according to official historical records (e.g., Records of the Grand Historian) and authoritative 
dictionaries (e.g., Kangxi Dictionary, Xinhua Dictionary, Big Dictionary of Chinese Characters). 

We present historical Chinese examples with Mandarin pronunciation for expository convenience. 

A quick search in the Chinese Text Project database (the largest online database of premodern Chinese texts) 


returns no results for dijid (while zhén and gud-rén both occur many times), and some modern dictionaries 
(e.g., Revised Mandarin Chinese Dictionary) explicitly mark dijia as a term from traditional Chinese opera. 
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Jintian _faxian-le yi-gen bai hizi, Zzhén hén youshang. [Mandarin] 
today _discover-PRF one-CLF white beard zhen.1SG very sad 

“Noticed a gray one in the beard today. Zhén (= I) is very sad.” 

Zicong bei dido-dao xin —_—bumén... zhéen jit méi zhiundian 
since PASS transfer-to new department... zhén.1SG_ still not.-have on.time 
xiaban-guo. 

knock.off-EXP 

“Ever since being put in the new department, zhen (= I) has never been able to knock off 
on time.” 


Suiran guda-rén shi danshén  yizu, danshi gi 
although lacking-person.1SG is _ single community but lone.1SG 
juéde, niishéng wi xuyao de shi péiban. 

think — girl most need NMLZ_ is company 


“Although lacking-person (= J) is single by choice, lone (= I) thinks what girls desire is 
company.” 


Guda-rén chi-zdo = hui si-zdi_— shéyou yongyudn chao-bi-xing 
lacking-person.1SG late-early will die-at roommate forever wake-not-awake 
Zit de —_naozhong xid. 


self REL alarm.clock — underneath 
“Lacking-person (= I) will sooner or later die from my roommate’s alarm clock, which can 
never wake herself up.” 


In (10), the term ¢rdm in both sentences is used to convey a jokingly arrogant tone. Similarly, in (11) 
zhén and guda-rén sound funnily bossy and a lot less sad/mad than if the default 1SG wo were used. Note 
that although Vietnamese tram and Chinese zhén and gud-rén were all once state rulers’ terms of self- 
address, they have a key difference: while Vietnamese tram is only used by male speakers, the two 
Chinese terms are used by both male and female speakers more or less equally frequently.'! For instance, 
the netizens in (11a, c) and (11b, d) are respectively male and female Weibo users, and they all sound 
jokingly pretentious.'” 


There are also predominantly feminine terms in this subtype. See (12) for a Vietnamese example 


and (13) for two Chinese examples. 


(12) 


11 


Ai-phi hém-nay dep qua! [Viet. ] 
beloved-concubine today pretty INTS 

“Beloved-concubine (= you) look gorgeous today!” 

Chao  di-phi, nho i-phi qua. 

Greet beloved-concubine miss beloved-concubine INTS 

“Hello beloved-concubine (= you), (I) miss beloved-concubine (= you) a lot.” 


In particular, zhén has evidently been gender-neutral throughout history, which may have to do with its original 


status in Old Chinese as a default 1SG pronoun. Thus, the Tang-dynasty empress Wu Zetian also referred to 
herself as zhén, as is recorded in official historical documents like New History of Tang (compiled in the 11" 


century). 
Interestingly, in (11c) the speaker mixes gud-rén and gu. The latter, literally “lone”, is an alternative to gud- 


rén and had also been frequently used by rulers in pre-Qin China. This suggests that the revival of ancient 
terms of address is a quite general trend online. 
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(13) a. Jiu-ming! Kudi géi bén-gong Idi yi-ping [Mandarin] 
save-life | quickly give this-palace.1SG bring one-bottle 
su-xido-jiu-xin-wdn! 
fast-effect-save-heart-ball 
“Help! Quickly bring this-palace (= me) a bottle of instant cardio-reliever pills!” 

b. Da-gii dd-de  bén-gong yi-gé gebo cu yi-gé gebo xi. 
hit-ball hit-RES this-palace.1SG one-CLF arm thick one-CLF arm_ thin 
Zénme po? 
how break 
“Since this-palace (= I) played too much badminton, one of my arms has become much 
thicker than the other. How can I get rid of this?” 


c. Wo yao ba tdobdo  xieé-le, kanjian xin yuéji jiu 
1lsG will = DISP Taobao uninstall-PRF see new Chinese.rose just 
xidng mdi... chéngié rén-bu-zhu wa! 


want buy — chénqié.1SG endure-not-TEL EXCL 
“Tl uninstall Taobao, as I want to buy every new Chinese rose I see... chéngié (= I) can’t 


help it!” 

d. Xiti-wdn  lidng-tian  jia, you yao huiqu ban-zhuan | le. 
rest-finish two-day break again must return carry-brick CRS 
Chéngqié bi xiang =shangban le. 
chénqgié.1SG not want work CRS 


“After a two-day break I must return to carry bricks (an idiom for ‘work’) again. Chénqié 
(= I) don’t want to work anymore.” 


In (12), the Vietnamese term di-phi is used in a funnily flirtatious way to refer to a female addressee. 
Like tram, di-phi synchronically retains its original gender (FEM) and is used only by male speakers to 
female addressees. This shows virtually no deviation from the term’s historical usage. The situation in 
Chinese, by contrast, is much more complex. Specifically, as (13) illustrates, there are at least two 
different feminine terms in Subtype I, bén-gong (13a—b) and chénqié (13c—d). The pragmatic effect of 
bén-gong is similar to that of aijia, though it sounds slightly less bossy due to the lower ranking of 
emperors’ wives/concubines than their mothers. Chéngié, on the other hand, sounds humbler and even 
a bit miserable due to its historical status as a term of self-address used by low-status females to their 
(royal) superiors.!* Netizens are well aware of the difference between bén-gdng and chéngqié, which is 
reflected in the different contexts of usage in (13a—b) and (13c—d). While bén-gong is used to jokingly 
give orders to imaginary servants (13a) or to express “worries” about one’s imperfect appearance (13b) 
(as royal concubines typically did), chénqié is used for more miserable scenarios, such as overspending 
(13c) or overworking (13d). 

Moreover, the historical usages of bén-gong and chéngié were much broader than their revived 
usages. Historically bén-géng could be used by anyone possessing a (royal) palace, including emperors’ 
wives, high-ranking concubines, and crown princes. But its modern revival is exclusively based on the 
wife/concubine sense, probably due to the omnipresence of this usage in TV series. Similarly, chénqié 
could be used in history by any low-status female when they spoke to royal superiors, including 
actresses, prostitutes, plebeians, emperors’ wives/concubines, and even empress dowagers when they 
needed to sound humble (Xia 2018). But its online usage is only based on the wife/concubine sense too, 
again due to its omnipresence in TV series. In addition, while these terms are predominately used by 
females, they are occasionally also used by males (mostly gay). For instance, a flamboyant gay character 
Yu Hao ina TV series Stand by Me constantly refers to himself as aijia, and a gay vlogger and cosmetics 
expert Benny Dong on Bilibili (the Chinese YouTube) regularly calls himself bén-gong in his video 


‘3 The ultimate origin of chéngié was an Old Chinese compound meaning “slaves” (lit. male slave [chén] and 
female slave [gié]). However, this original sense had long become obsolete, and chénqié shifted to its feminine 
usage in as early as Eastern Han dynasty (25—220 C.E.). See Xia (2018) for a detailed discussion. 
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titles. As for chéngié, the catchphrase Chéngié zud-bu-dao a! ‘I really can’t do it!’" is trendy among 
netizens of all genders and sexual orientations. 

A word of caution is in order on the category of bén-gong. Although its literal meaning “this-palace” 
makes it resemble English this reporter, the present author, etc., which are imposters a la Collins and 
Postal, we must note that this Chinese term has no R-expression usage or flexible reference. Thus, it 
cannot be used in third-person cases like (14a), unlike English this-terms, as in (14b). 


(14) a. *Bén-gong bi bié-gong méi. 
this-palace.3SG compared.to other-palace beautiful 
“Intended: This palace (= she) is more beautiful than other palaces (= other concubines).” 
b. This reporterssg is nicer than that one. 


In other words, Chinese bén-gong is a lexically fixed, idiomatic term of self-address, which makes it 
qualify as a noncanonical pronominal item in our criteria."° 


2.2 Subtype II: Dialectal terms 

The second subtype of noncanonical pronominal item we have identified in Vietnamese and Chinese 
involves dialectal terms that have made their way into the common language via mass media (e.g., TV 
programs) or the Internet. See Table 3 for some examples. 


Table 3: Subtype-II noncanonical pronominal items in Vietnamese and Chinese 


han ‘38G’ (from Central and Southern dialects)!® 
Vietnamese : 
y ‘3SG’ (from Northern dialects) 


ou ‘1SG’ (from Min/Yue Chinese) 
é ‘1SG’ (from Shaanxi Mandarin Chinese) 
Chinese = an ‘1SG’ (from Northern/Central Mandarin Chinese) 
nong ‘2SG’ (from Shanghai Wu Chinese) 
ya ‘3SG’ (from Beijing Mandarin Chinese) 


There are several distinctions between dialectal terms and revived ancient terms. First, while revived 
ancient terms are restricted to the first and the second person, dialectal terms also involve third-person 
items. For Vietnamese in particular, these terms are strictly 3SG (see Alves 2017 for more detail on the 
etymology of these). This is not surprising because dialectal terms are simply default pronouns in their 
original dialects. Second, unlike revived ancient terms, dialectal terms may not have gender restrictions 
at all, and in some cases not even preferences. Third, unlike many revived ancient terms (especially the 
royal ones), dialectal terms generally do not bear arrogant or bossy tones, so their pragmatic effects are 
of a different sort. Consider (15) for Vietnamese. 


This is a quote from the highly popular TV series Empresses in the Palace and has gone viral via memes. 

In this sense bén-gong patterns more like yours truly and muggins here in English, which have no R-expression 
usage or flexible reference either, even though they are classified as imposters in Collins & Postal (2012). 
Although han has been documented in various parts of Vietnam, including the North (Cao 2014), the term was 
originally from the Southern dialect (see, e.g., Hoang 1989). It should also be noted that while han is mostly 
used as a neutral 3SG in the South, it is often used with pragmatic effects in the North (addressed later in this 
section). 
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(15) Di roi khéng biét  bao-gio han modi vé. — [Vietnamese, Southern speaker] 
go PRF NEG’ know when 3SG PRT return 
“(He)’s gone. (1) don’t know when s/he would return.” 
b. Khéng biét thi hoi han thir. [Southern speaker] 
NEG know then ask 3SG_ try 
“Tf (you) don’t know then try asking him/her.” 
c. Chinh han cuc vang cua tui. [Northern speaker] 
precisely 3SG CLF gold POSS’ ISG 
“Tt’s precisely him, my piece of gold.” 
d. Lai nho han a? [Northern speaker] 
again miss 3SG Q 
“Are (you) missing him again?” 
e. Méonha tao di bat y — sdng nay roi. [standard variety, Internet language] 
cat home 1SG PST catch 3SG morning DEM PRF 
“My cat caught it this morning.” 
f. Toi néi loi yéu y, nhungsao y — khéng hiéu? [standard variety, Internet language] 
1sG say word love3SG but why 3SG NEG_ understand 
“T said loving words to him, but why hasn’t he understood?” 


= 


First, the term han is specific to Central and Southern dialects of Vietnamese, is a regional variant of 
the standard 3SG nd, and has entered the standard variety due to dialect contact. Note that although han 
is a neutral term in the original dialect, as in (15a—b), it is often used to sound cute/funny by speakers 
of other varieties. In (15c), for example, the speaker of Northern Vietnamese uses han as an endearing 
term to refer to her baby, who she considers “a piece of gold” in her possession.'’ Similarly, another 
Northern Vietnamese speaker in (15d) uses Adn to refer to their interlocutor’s boyfriend, who is 
presumably being missed by the interlocutor. The use of han in both contexts sounds more fun and cuter 
than the standard variant nod. 

The next Vietnamese item in this subtype is y, which is originally and mainly used in Northern 
dialects to refer to a male criminal. The term is therefore formal, but due to crossdialectal contact it has 
now become more widely used online as a jokingly serious pronominal form. In (15e), for example, y 
is used to refer to a mouse (who is in this sense cast as a criminal), which makes the sentence much 
funnier. Similarly, the speaker in (15f) uses y to complain about her crush, who has not returned her 
affection. Like han, the use of y in these contexts brings about some dramatic comic effects. Unlike han, 
however, there is a strong preference for a masculine interpretation particularly when the referent of y 
is human. When the referent is nonhuman, y can in principle be used neutrally (e.g., in (15e), we do not 
know whether the mouse is male or female). 

The Chinese inventory for this subtype is again more diverse. Due to space limitations, we restrict 
our detailed description to only three of the Chinese terms from Table 3. All examples in (16) are from 
Common Mandarin produced by Weibo users. 


'7 Cuc vang ‘a piece of gold’ is an idiomatic expression in Vietnamese which is most often used by parents to 
refer to their precious children. 
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(16) a. Ou zhé-ge  ydn-zhii-zi_ — hdo-kan. [Mandarin] 
1SG this-CLF eye-ball-DIM good-look 
“My eyes are good-looking.” 


b. Weéishénme du de xin-dong  nanshéng hdi bu — chiaxian? 
why 1sG POSS heart-move guy still not appear 
“Why hasn’t the guy I will fall in love with shown up yet?” 
c. E diyi-ci pengdao chdo-wadn _ jia juibao jid  gdo-laoshi de. 


1sG first-time encounter argue-finish quarrel report plus  tell-teacher NMLZ 
“This is the first time I have ever encountered someone who reports the other person to 
teachers after quarreling with them.” 

dE di guimi jingran ba 6 — wui-di de 
1sG POSS  best.female.friend go.so.far.as.to DISP 1SG most-love REL 
xidngshui —-yong-zud_ kongqi-qingxin-ji! 


perfume use-as air-freshen-agent 
“My bestie outrageously used my favorite perfume as air freshener!” 

e. Ni hé  zhurén shud  yi-ju rang ta shuan shéng, ya zhang-kou 
2SG with owner say one-CLF let 35G tie rope 3SG.OFF open-mouth 
jit = xiaoxixt hé- ni shuo: “wo jid gou bu yao rén.” 
just giggle with 2SG say 1sG house dog _ not bite person 


“You ask the dog owner to tie their dog up, and they3sc (= that asshole) just giggle and tell 
you: ‘My dog does not bite.’” 


f. Ni yue gen ya shud hao-ting-de ya yue Idi-jin. 
2SG the.more with 3SG.OFF say  good-listen-NMLZ 3SG.OFF the.more come-strength 
Yishanglai ba ya ma-xiaqu... hdi néng you didn yong. 


right.at.the.beginning DISP 3SG.OFF scold-down still can have some use 

“The more kind words you say to them3sc (= that asshole), the more shameless they3sq (= 
that asshole) become. It would be more effective if you simply swear back and tell them3sq 
(= that asshole) to get out of the car right away.” 


The three terms in (16) have respectively been borrowed from Min/Yue Chinese (du), Shaanxi 
Mandarin Chinese (é), and Beijing Mandarin Chinese (vd). The first item, du, is a regional variant of 
Mandarin wo and (re)entered Common Mandarin due to netizens’ mocking of the dialectal 
pronunciation. According to Chen (2009:215), it was the most popular mutant personal pronoun online 
in the noughties. Perhaps due to its initial role as a mocking term, du sounds funny and cute and is often 
used by netizens who want to appear jolly and adorable. The current usage of du is no longer for 
mocking purposes. Thus, the netizen in (16a) happily posts about her satisfaction with the look of her 
eyes without the intention to mock anyone, and the netizen in (16b) laments her single status with a 
puppy-face-like tone. In both sentences, the use of du instead of the default 1SG w6 makes the speakers 
sound more likable and less boastful or whiny. 

The second item, é (sometimes rendered as é or ngé) became widely known in the noughties via 
the popular TV series My Own Swordsman, in which the leading actress spoke Shaanxi Mandarin 
throughout the eighty episodes. Due to the comedic nature of that show and the fussy personality of its 
main character, the term has subsequently gained a jokingly fussy tone in Internet language. Thus, the 
netizen in (16c) is making a fuss about the base behavior of a tattletale student, and that in (16d), about 
her best friend’s inadequate use of her perfume. No such fussy tone would be present if the default 1sG 
wo were used instead. In addition, in (16d) the possessive marker di is also borrowed from Shaanxi 
Mandarin, whose Standard Mandarin counterpart is de. This again reflects the stylistic agreement 
mentioned in footnote 6. 

A special note is in order concerning the Beijing Mandarin term ya, which is originally a highly 
vulgar expression meaning “child of a girl with no recognized marital status”. It is etymologically short 
for yatou-ydng-de ‘low.status.girl-raise-NMLZ’, but nowadays this literal meaning is obsolete, and yd is 
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mainly used as an offensive suffix attached to pronouns and demonstratives (e.g., ni-yd ‘2SG-OFF’, néi- 
ya ‘that.person-OFF’).'® That said, it has developed a stand-alone pronominal usage as well. Crucially, 
in this usage, it can only be interpreted as 3SG (gender neutral). Thus, the netizens in (16e—f) respectively 
complain about the bad behavior of a dog owner and that of a taxi customer. Since ya@ has no synchronic 
R-expression usage related to its pronominal usage’” and has a lexically fixed person index, we treat it 
as a noncanonical pronominal item instead of an imposter. 


2.3 Subtype III: Creative online coinages 
The third subtype of noncanonical pronominal item we have identified in Vietnamese and Chinese 
involves items that do not fall in the previous two subtypes. These are mainly creative coinages on the 
Internet and so in a sense “native” to Internet language. Since such online coinages are not based on 
any particular type of source, their lexical materials are miscellaneous or even totally novel. See Table 
4 for some examples. 

Note that all three Vietnamese examples in Table 4 are second-person terms, which carry different 
pragmatic effects due to the lexical materials they recycle. See (17) for some real-life examples. 


Table 4: Subtype-III noncanonical pronominal items in Vietnamese and Chinese 


cung ‘dear.2SG’ 
Vietnamese con-quy ‘devil.2SG’ 
nguoi-dep ‘beautiful.person.2SG’ 
qin ‘dear.2SG’ 
Chinese — bén-lii/bén-lu/bén-lu ‘this-loser.1SG’ 


lunjia ‘others.1SG’ 


(17) a. Cwng muon gi ti anh nao? [Viet.] 

dear.28SG want what from 1ISG.MASC AFFECT 
“What do you (= dear) want from me?” 

b. Con-quy dang lam gi do? 
devil.2sG PROG do what DM 
“What are you (= devil) doing?” 

c. Cam-on ngwoi-dep da mo hang. 
thank __ beautiful.person.2SG PST open — shop 
“Thank you (= beautiful person) for being the first customer today!” 


All three boldfaced terms in (17) are creative coinages by market sellers, which are now widely used 
thanks to online marketing. Similar to noncanonical pronominal items in the other two subtypes, those 
in Subtype III encode special pragmatic effects too. For example, the uses of cung and con-quy as 2SG 
terms in (17a—b) sound deliberately cute and friendly (and possibly a little flirtatious), while the use of 
nguoi-dep in (17c) is flattery/fashionable. These terms are very creative and have no fixed lexical 
sources. Similarly, see (18) for some real-life examples of Chinese terms in this subtype. 


18 A de can be optionally added to these terms (e.g., ni-yd-de), which is a residue of the nominalizer in the full 
form. 

‘9 The character for yd ( Y ) does record other meanings too, such as “branch, twig” or more generally any Y- 
shaped object like the front part of a foot, but those are irrelevant to the pronominal ya. Thus, such polysemy 
is qualitatively different from that in cases like “teacher” as an R-expression and “teacher” as a term of address. 
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(18) a. Om kéyi shoucang wo-men de dianpu hé _ Tianjié... [Mandarin] 
dear.2SG can save 1SG-PL POSS store and = link 
xiiyao de shihou zai lidnxi ne. 
need REL time again contact EMP 
“Dear (= you) can save our store and link and contact us again when you have need.” 
b. Suiran bén-li méi gidn  méi ming, dan wo anquan 
though this-loser not.have money not.have fame but 1SG safety 
yishi yiliu. 


awareness first-class 

“Although this loser (= I) has no money or fame, I have first-class safety awareness.” 
c. Lunjia zhéndeshi huichanghuichangxihuan mdo biyi  o. 

others.1SG_ really very very like Mao Buyi EXCL 

“Others (= I) really like Mao Buyi very very much!” 


Qin originated on the shopping website Taobao as a friendly term of address between sellers and 
customers. Thus, in (18a) the Taobao seller says “dear can...” instead of “you can...” to encourage the 
customer to save their online store. Bén-lu and its tonal variants are all coined by combining the deictic 
bén ‘this’ and the first syllable of English loser. It has the pragmatic effect of self-mocking.”° Thus, in 
(18b) the netizen self-mockingly takes pride in his safety awareness despite his poor status. Lunjid is a 
deliberately distorted variant of rénjia ‘others’ and sounds cute and jocular when used as a 1SG term. 
Thus, in (18c) the netizen expresses her obsession with the pop singer Mao Buyi. Also note the form 
huichang in (18c), which is a distorted variant of the degree adverb feichang ‘very’ and adds to the 
cuteness of the utterance. This is another instance of the aforementioned stylistic agreement (see 
footnote 6). 

Since this is the first systematic documentation of such creative pronominal coinages to our 
knowledge, we want to give a bit more detail on the above terms to justify our identification of them as 
noncanonical pronominal items. First, we are aware that gin has R-expression usages, partly due to the 
versatility of its lexical root, which can mean “parent, kin (n.)”, “intimate, dear (adj.)”, “kiss (v.)”, etc. 
Moreover, one of its R-expression usages is closely related to its pronominal usage. Thus, one can 
friendly refer to someone (a third person) as gin, as in (19). 


(19) Weibo hai you qin aa ma? [Mandarin] 
Weibo - still have dear.N _ be.at Q 
“Are there still anyonefrienaly on Weibo?” 


However, we do not treat gin as an imposter because its pronominal usage has evidently developed 
from its term-of-address usage, which is also its predominant usage on Taobao (see, e.g., Deng 2012 
and Liu 2012). In fact, previous studies rarely mention the usage exemplified in (19), which suggests 
that it might be a more recent development from either the pronominal or the term-of-address usage. In 
any event, the 2SG qin is not a pronominally used R-expression in nature and thus does not fit the 
canonical situation of imposters. 

Second, just like the Subtype-I term bén-gong ‘this palace.1SG’, bén-/i also contains a deictic bén, 
but we treat it as a noncanonical pronominal item instead of an imposter because it too can only refer 
to the speaker (but not a third person) and has no R-expression usage, as exemplified for bén-gong in 
(14). 

Third, the phonological distortion that has created /unjia (rén—lun) has brought along some 
interesting change to its syntactic status. On the one hand, while both rénjia and /unjid can be used as 
1sG terms, only rénjiad has a separate 3SG usage (i.e., “others’”), hence the ambiguity of (20a). By 
contrast, /unjiad can only refer to the speaker, hence the ungrammaticality of the 3PL reading in (20b). 


20 This term, especially its bén-/i variant, is mainly used by males, since the Chinese character usually adopted 
to represent /i (7) also means “(male) masturbate”, which further adds to the self-mocking effect of the term. 
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(20) a. Bié zhuang le! Rénjia you bi shi shazi. [Mandarin] 
don’t pretend CRS  others.3PL/ISG EMP not COP idiot 
“Stop pretending! Others (= they/T) are not idiots.” 
b. Bié = zhuang le! Linjia you bu shi — shazi. 
don’t pretend CRS others.*3PL/ISG EMP not COP idiot 
“Stop pretending! Others (= *they/I) are not idiots.” 


On the other hand, while rénjid has a shy or embarrassed tone when used as a term of self-address, 
lunjia furthermore sounds adorable and cartoon-like. Thus, while (20a) sounds like real blaming (with 
an embarrassed tone in the 1SG reading), (20b) sounds like the speaker is just teasing the addressee. 


2.4 Interim summary 

Based on Vietnamese and Chinese data, we have identified and exemplified a new type of pronominal 
item that is emerging in the Internet era, which we have dubbed noncanonical. As we have shown, 
noncanonical pronominal items differ from both textbook default pronouns and imposters in nontrivial 
ways. Crucially, unlike imposters, they have lexically fixed referents and no common R-expression 
usage, and unlike default pronouns they have various pragmatic effects. Overall, though, they pattern 
almost identically to default pronouns in syntax except for their extragrammatical effects. So, we 
tentatively rename noncanonical pronominal items “noncanonical pronouns” and give them the 
following working definition: 


(21) Noncanonical pronouns are syntactically well-behaved pronouns with extragrammatical effects. 


Moreover, the extragrammatical effects in noncanonical pronouns are not associated with conventional 
sociolinguistic factors like the relationship or relative hierarchical status between speakers and 
addressees.”! In fact, most examples we have given are not even from interpersonal communication but 
from online posts. Rather, the extragrammatical (e.g., register, tone) effects we have observed are more 
typically associated with the mind-sets and personalities of individual netizens themselves. To illustrate, 
adjectives we have used to describe the special effects of Vietnamese and Chinese noncanonical 
pronouns include the following: 


(22) joking, jocularly arrogant, fun, funnily bossy, jokingly pretentious, funnily flirtatious, miserable, 
cute, jokingly serious, dramatic, jolly, adorable, puppy-face-like, jokingly fussy, offensive, 
deliberately cute, friendly, flattery, fashionable, self-mocking, cartoon-like, teasing 


These descriptions are highly compatible with the Internet register. For instance, while it would sound 
bizarre or even off-putting if an adult keeps talking in a dramatic, cute, or cartoon-like tone in reality, 
this is totally fine and acceptable on the Internet. In a similar vein, while gender does play a role in 
regulating the use of noncanonical pronouns, at least in Chinese, it is apparently one’s gender identity 
(or sexual orientation) rather than their biological sex that guides their choices of noncanonical terms. 
This is another state of affairs increasingly normal in contemporary Chinese society, especially on the 
Internet. 

Perhaps due to the unique features of the Internet as a modality of communication and the 
somewhat similar technological context it has endowed netizens around the world with, we have 
observed striking similarities in Vietnamese and Chinese noncanonical pronouns not only in their usage 
but also in their lexical sources. Specifically, for both languages we have identified three major types 
of noncanonical pronouns based on their evolution pathways: revived ancient terms, dialectal terms, 
and creative online coinages. Also, for certain terms (e.g., tram, di-phi), we have even observed mass- 
media-based crosslinguistic borrowing. 


21 So noncanonical pronouns are different from the kind of interpersonal-relationship- or social-hierarchy-based 


pronominal items typically seen in East and Southeast Asian languages like Japanese, Korean, and Thai. 
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On the other hand, Vietnamese and Chinese show two main differences in their noncanonical pronouns. 
First, they differ in the sizes of their respective inventories, with Chinese having more actively used 
terms in almost every subtype. Second, they differ in the person/gender/number propensities of certain 
terms or even entire subtypes. For instance, Vietnamese terms like tram (male) and di-phi (male to 
female) show rather strict gender-based usage, and both Subtype-II and Subtype-III terms in 
Vietnamese show person/number restrictions (to 3SG and 2SG respectively). Whether these restrictions 
are categorical or due to the limited size of our data set requires further investigation, but our 
observation so far suggests that while Vietnamese has a more developed and less restricted imposter 
system than Chinese (see footnote 4), Chinese has a more developed and less restricted noncanonical 
pronoun system than Vietnamese. What this contrast means is an intriguing point of future research, 
but in the rest of this article we focus on the formal-grammatical status and syntactic representation of 
noncanonical pronouns. 


3. Taxonomy 

Before presenting our formal syntactic analysis, we first make a brief remark on the taxonomy of 
pronominal items. At first sight, an intuitive way of classifying pronominal items is to build on the 
conventional textbook system. Since textbook pronouns are deemed standard or default, nontextbook 
pronominal items could be termed alternative or nondefault. Then, under the alternative class, we further 
set two subclasses: imposters and noncanonical pronouns. We illustrate this taxonomy with the diagram 
in Figure 1 and call it the usage-based taxonomy, since properties like conventional/nonconventional 
and canonical/noncanonical are from the perspective of language use. Also, to give the taxonomy a bit 
more systematicity, we recast the two binary-branching nodes in terms of two yes-no questions. 


Figure 1: A usage-based taxonomy of pronominal items 


[Is the item a textbook default? | 


yes / \no 


default [Does the item have common R-expression usage? | 


\no 


yes / 
imposters] |noncanonical 


However, from a grammatical perspective the taxonomy in Figure 1 is obviously flawed, because even 
though imposters and noncanonical pronouns are both nondefault pronominal items, grammatically 
speaking noncanonical pronouns resemble default pronouns to a much greater extent than they resemble 
imposters. So, in a grammatically more precise taxonomy default and noncanonical pronouns should 
be grouped under the same class at some level. To this end, we propose the syntactically based 
taxonomy in Figure 2, which is also accompanied by two yes-no questions defining the two binary- 
branching nodes. 


Figure 2: A syntactically based taxonomy of pronominal items 


[Does the item have common R-expression usage? | 


no/ \yes 


[Does the item have idiosyncratic effects? | 


no / \yes 


Incidentally, the syntactically based taxonomy is also more desirable from an acquisitional perspective, 
since the two questions in Figure 2 can both be easily answered based on the primary linguistic data a 
child has direct access to, whereas the first question in Figure 1 hinges on more sophisticated knowledge 
about the world, more exactly about language textbooks and reference grammars. According to the 
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principles and parameters approach to linguistic variation and diversity (Chomsky 1981), especially its 
neo-emergentist incarnation (Biberauer 2017), children acquire grammatical knowledge based on 
positive evidence in the significant acquisitional input (e.g., high-frequency recurring forms and 
collocations in everyday adult speech). So, a certain grammatical phenomenon is acquired early (or at 
least is acquirable) if it regularly and saliently exists in the primary linguistic data and therefore can be 
easily detected by the acquirer. Building on this theoretical background, a further prediction Figure 2 
makes is that in languages where imposters are a regular part of the grammar, such as Vietnamese, 
imposters are acquired earlier than default pronouns.”” We leave the verification of this claim to future 
research. 


4. A formal syntactic analysis 

After laying out the comparative data ($2) and the syntactically based taxonomy (§3), in this section we 
propose a formal syntactic analysis of noncanonical pronouns within the minimalist program (Chomsky 
1995 et seq.), more exactly within the generalized root syntax theory put forth in Song (2019). The 
purpose of this analysis is threefold: (1) to explain why noncanonical pronouns behave the way they do 
in a formally explicit way, (ii) to improve current syntactic theory of pronouns and make it empirically 
more adequate, and (iii) to tentatively explain why noncanonical pronouns have restricted 
crosslinguistic distribution. Before presenting our particular analysis ($4.2), we first introduce the 
theoretical background of pronominal syntax ($4.1). 


4.1 Pronouns in generative syntax 

AS we mentioned in §1, previous syntactic studies of pronouns have mainly focused on default pronouns, 
especially those that constitute exponents of phi features. Within the Chomskyan school, since the 
proposal of the DP hypothesis (Abney 1987), default pronouns have been associated with the category 
D (or its more elaborate equivalents) in one way or another. Thus, Abney (building on Postal 1966) puts 
pronouns at the D head position and argues that they project DPs on their own (i.e., without 
specifier/complement elements). Then, with the popularization of the split-functional-projection idea, 
which was initiated by Pollock’s (1989) split-IP hypothesis and Rizzi’s (1997) split-CP hypothesis in 
the verbal domain, many authors have proposed elaborate hierarchical structures for the nominal 
domain as well (e.g., Ritter 1995, Cardinaletti and Starke 1999, Neeleman and Weerman 1999, Borer 
2005, Ritter and Wiltschko 2019). In relation to pronouns, quite a few authors have put forth the idea 
that they may correspond to different parts or zones in the nominal tree. 

For instance, Ritter (1995) proposes two kinds of pronouns, which respectively occupy D and Num. 
Déchaine and Wiltschko (2002) propose three kinds of pronouns, which they name pro-DP, pro-@P, 
and pro-NP. Like Ritter, Déchaine and Wiltschko also let pronouns occupy head positions. But that is 
not the only solution. There are also researchers who propose that pronouns realize entire nominal 
tree(let)s (e.g., Weerman and Evers-Vermeul 2002, Neeleman and Szendréi 2007). As Neeleman and 
Szendréi suggest, this approach is more natural in late spell-out frameworks like distributed morphology 
(Halle and Marantz 1993 et seq.), in which syntactic computation operates on formal features, whose 
phonological realization is only dealt with at the syntax-phonology interface. Since such a computation- 
before-pronunciation view is also more generally endorsed in the minimalist program, we accept it as a 
background assumption without further discussion. That is, we assume that both terminal and 
nonterminal nodes may be spelled out by phonological units. This means that what looks like a single 
word or even morpheme on the surface may be a complex hierarchical structure in the underlying syntax 
and that generative syntax can well handle this type of phenomenon (by means of nonterminal spell- 
out). 

Special attention needs to be paid to two recurring issues in previous generative analyses of 
pronouns. The first issue is the division of labor in the elaborate structure of pronouns. Take Déchaine 
and Wiltschko’s pro-DP structure for example. 


22 However, Figure 2 does not predict that default and noncanonical pronouns are acquired simultaneously, 
because the latter are a novel phenomenon in Internet language and not yet part of the grammatical knowledge 
relevant for first language acquisition or parameter setting. 
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(23) Pro-DP (illustrated by Halkomelem ti-t/’6 ‘DET-3SG’; Déchaine and Wiltschko 2002:412) 
DP 


to NP 


| | 
tl’o @ 


In this structure, both D (for definiteness) and @ (for person/number) are overt, but NP is empty. 
Déchaine and Wiltschko hypothesize that the NP position may be either overt or null, and that when it 
its filled with lexical material we get a normal [determiner noun] phrase, such as (24). 


(24) =‘ TI'6-cha-l-su qwemciwe-t_ [thi-tl'6 q'ami] arc. [Halkomelem] 
then-FUT-1SG-so hug-TRANS  DET.FEM-3SG girl 
“Then I’m going to hug that girl.” (Galloway 1993:174, via Déchaine and Wiltschko 2002:412) 


However, Déchaine and Wiltschko do not specify what the semantic contribution of NP is when it is 
null as in (23). A similar scenario occurs in Ritter and Wiltschko (2019), where an even more elaborate 
nominal domain is proposed, which has three zones—a lexical, a functional, and an interactional zone— 
each subsuming a number of categories. Take their analysis of German du ‘2SG’ in (25) for example. 


(25) German pronoun du ‘2SG’ (Ritter and Wiltschko 2019:3) 
Speech Act Structure 


~ 


/ \ 
D_s~PhiP 
[per] / \ 
Phi nP 
[num] / \ 
n NP 


aN 
/ \ 


[gen] / \, 


The authors do not discuss the spell-out procedure, but their theory is compatible with a late spell-out 
approach (Wiltschko, p.c.), so we can assume that the phonological unit du somehow realizes the whole 
tree in (25). Again, we see concrete semantic contributions of each syntactic category except NP, though 
it presumably has to be there since otherwise the tree is not lexically grounded. 

Lexical grounding is precisely the second issue in previous syntactic studies that we would like to 
invite readers to pay attention to. While both (23) and (25) have a functional-above-lexical scaffolding, 
as is standardly assumed in current generative syntax, this is not always the case in earlier studies of 
pronouns. For example, the following trees from Abney (1987) and Ritter (1995) have no lexical bases 
(i.e., NP). 
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(26) a. Abney (1987:284) b. Ritter (1995:419) c. Ritter (1995:421) 
DP DP DP 
| | / \ 
D D D NumP 
| [person] [definite ] | 
we [number] [person] Num 
[gender] hu/hi/hem/... 


(1'/2"4-person pronouns in Hebrew) (3"-person pronouns in Hebrew) 


It is unclear how such trees can be generated in minimalism, in which functional categories by definition 
build on or extend lexical categories. In fact, in some minimalist theories, such as distributed 
morphology, the lexical grounding even provides the categorial feature for the entire tree (Roberts 2019 
has a similar idea). 

The reason why we want to draw readers’ attention to the above two issues is because they are 
prevalent in the literature and deserve reflection if we want to guarantee the implementability of 
particular theories in the minimalist program (e.g., in the fashion of Collins and Stabler 2016). That 
said, we do not think the theories cited above are inherently problematic. Rather, they are just of a lower 
granularity level, where certain details are glossed over. As a granularity-raising move, we find an idea 
in Harbour (2016) helpful. Abstracting away from the technical details, Harbour’s core idea is that the 
individual variable (a semantic primitive), which any further nominal semantic function (be it person, 
number, or whatever else) relies on, is introduced before the further formal features are introduced. He 
lets the individual variable be introduced at the N level and the further features be introduced at the D 
(or split-D) level, as in (27). 


(27) | Semantic division of labor in the nominal structure (adapted from Harbour 2016:77) 
DP 
/ \ 
D N 
[phi] ...Ax... 


The overall effect of (27) is that the phi features in D together yield a particular set of individuals, which 
the individual variable in N ranges over.”’ Eventually, some other features in the nominal structure pin 
down the referent and assign it to the individual variable x. Acquaviva (2019) independently proposes 
a similar idea in an even more fine-grained theory, where the N part is further decomposed into a 
nominalizer 7 and a root in the sense of distributed morphology, and it is the nominalizer that introduces 
the individual variable that serves as the semantic grounding of the whole nominal phrase. 

Assuming the above high-granularity details in the background, we can now safely take any of the 
aforementioned theories of default pronouns as a point of departure without worrying about shaky 
foundations. In fact, which specific DP structure we adopt is immaterial to our own analysis as long as 
it does not suffer from foundational problems and has the necessary components to derive a syntactically 
well-behaved default pronoun. Therefore, when the DP-internal structure is inconsequential, we simply 
use the very-low-granularity label DP. to indicate a pronominal DP (as Aldridge 2021 does in her 
study on Old Chinese pronouns). 

Having seen how default pronouns are typically analyzed in generative syntax, we now build our 
analysis of noncanonical pronouns on top of that, because as we mentioned in §2.4, noncanonical 
pronouns are just default pronouns equipped with idiosyncratic extragrammatical effects. Since they 


3 Harbour’s tree is highly abbreviatory, which cannot be taken at face value for issues like labeling (Chomsky 


2013 et seq.). We assume that the tree in (27) automatically gets the label DP once the omitted details are filled 
back. 

We use DP as an umbrella label for the entire (pro)nominal structure, which abbreviates not only the lexical 
and functional zones but also the Wiltschkovian interactional zone if that is present. 
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behave like default pronouns in syntax, they must involve the same sort of syntactic structure as default 
pronouns, whatever that structure is. What our analysis mainly serves to explain, therefore, is how the 
idiosyncratic effects are formally associated with that structure. 


4.2 Generalized root syntax 

To begin with, note that the special effects of noncanonical pronouns come from the particular terms 
themselves, more exactly from the idiosyncratic content associated with their lexical materials, rather 
than from the context. This is because they show the same effects in all examples. For instance, 
Vietnamese tram and Chinese zhén (both 1SG) sound jokingly arrogant wherever they appear, and they 
have this effect precisely because they were once used by emperors. Similarly, Vietnamese cwng and 
Chinese gin (both 2SG) sound deliberately friendly, either in commercial discourses or not, and this effect 
has to do with the terms’ original lexical meaning “dear”. An even more interesting case is Chinese /unjia 
‘others.1SG’, which not only takes some lexical material (rénjia ‘others. 1SG/3SG/3PL’) but also distorts 
it (rén—lun), and the distortion in turn serves to distinguish the new noncanonical usage from the old 
imposter usage. The key mechanism involved in all these cases of noncanonical pronouns is thus the 
recycling of existing lexical material for new grammatical purposes (i.e., a kind of grammaticalization). 
This process is evident in all three subtypes of noncanonical pronouns in §2. It is also a manifestation 
of a fundamental strategy in human language and cognition, which Biberauer (2017) terms “maximize 
minimal means” (MMM). In Biberauer’s (2017:41) words, MMM is both “a generally applicable 
learning bias harnessed by the acquirer during acquisition” and “a principle of structure building” that 
“facilitat[es] the kind of efficient computation and ... the self-diversifying property that allows human 
language to be the powerful tool that it is” (see Biberauer 2011 et seq. for more background on this line 
of thought). 

Song (2019) develops a “generalized root syntax” to tackle half-grammatical-half-lexical 
vocabulary items, of which the noncanonical pronouns studied here are a specific instance. Song’s 
theory is an extension of the root theory in distributed morphology (hence its name), where content 
words like nouns and verbs are decomposed into a functional part, called the categorizer, and a purely 
lexical part (which does not have a syntactic category), called the root. Thus, the noun dog is analyzed 
as [n VDOG], and the verb run is analyzed as [v VRUN]. The idea is that all that participates in formal 
computation is essentially functional-categorial when the representation is fine-grained enough, while 
idiosyncratic information like lexical sound/meaning and encyclopedic knowledge is sealed in a 
syntactically inert capsule that is only opened when the syntactic representation is sent to the 
phonological/semantic interfaces for interpretation. This lexical decompositional practice pushes 
syntactic methods to the traditional morphological arena, so distributed morphology is also known as a 
“syntax all the way down” approach.”° 

While accepting the lexical decompositional approach of distributed morphology, Song points out 
that its particular view on root categorization is flawed, for it stipulates that only traditional lexical 
categories can serve as categorizers, but that assumption leads to theory-internal contradiction under 
close scrutiny (see Song 2019:102 for details). Since from a formal perspective the categorization 
procedure just serves to equip the otherwise inert root with a syntactically active shell, logically 
speaking any functional category can do the job, and the specialness of traditional lexical categories 
(i.e., the little x categorizers in distributed morphology) merely lies in their bare-predicate- 
making/typing semantics. That is, they introduce typed individual variables in the sense of Harbour 
(2016) and Acquaviva (2019): 


25 A caveat here is that distributed morphology does not predict that any root-categorizer merger can yield a 
legitimate vocabulary item (which is a common straw man in criticism of the framework). Rather, the 
interpretability of particular root-categorizer combinations is a matter of language-specific lexicalization (in a 
broad sense of the term), and root syntax merely offers a tool to structurally represent and analyze such 
lexically stored information. A biggest achievement of root syntax, in hindsight, is that it has pushed syntactic 
theory to a higher granularity level and thereby formalized a further aspect of regularity in human language 
(i.e., the very basic phenomenon of categorization). We deem this a significant step forward. 
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(28) a. N-categorizer (“nominalizer”): introduces an ent it y-type individual variable 
b. V-categorizer (“verbalizer’): introduces an eventuality-type individual variable 
c. X-categorizer: introduces an x-type function 


Thus, root categorization is more generally root support; namely, the enrichment of a functional 
category with some idiosyncratic information encapsulated in a root. For instance: 


(29) a. dog =[n VDOG] =an entity-type individual that is called /dog/, has four legs, can bark, etc. 
b. run = [v VRUN] = an eventuality-type individual that is called /ran/, involves leg-moving, 
etc. 


From the perspective of syntacticosemantic computation, only the underlined information in (29) is 
relevant, while the rest merely supports this skeletal information and makes it suitable for postsyntactic 
purposes (describing the world, communication, etc.). In other words, the extragrammatical effects of 
content words is precisely their idiosyncratic root content. Extending this mechanism to X-categorizers, 
what we obtain is a functional category equipped with idiosyncratic extragrammatical effects; namely, 
a half-grammatical-half-lexical item. Song (2019) gives miscellaneous examples from Chinese to 
illustrate this, such as those in (30). 


(30) a. passive auxiliaries: béi ‘lit. cover, suffer (neutral)’, géi ‘lit. give (colloquial, negative)’ 


b. classifiers: jian ‘lit. item (for clothes, etc.)’, duo ‘lit. flower (for flowers, clouds, etc.)’ 
c. conjunctions: hé ‘lit. union (general purpose)’, yi ‘lit. accompany (formal, literary)’ 


These items can all be analyzed as a functional category supported by a root (e.g., [Voicepass \BEI]). 
The categorial part determines their syntactic functionality, while the root part determines their 
extragrammatical effects and thereby conditions their real-life usage (e.g., the “and” in Harry Potter 
and ... is yu instead of hé). 

Returning to noncanonical pronouns, the same analysis applies. Since we have established that the 
syntactic behavior of noncanonical pronouns is the same as that of default pronouns, we treat them as 
root-supported Dpro items. However, given the elaborate DP structures in §4.1, we next clarify three 
technical details to show how our analysis fits into the big picture of pronominal syntax. 


4.3 Deriving noncanonical pronouns 

First, following Nunes and Uriagereka (2000), Johnson (2003), and especially Zwart (2007 et seq.), we 
assume syntactic derivation to be multilayered. That is, derivational products of an earlier 
cycle/workspace can be used in a subsequent cycle, probably in an “atomized” fashion (Fowlie 2013). 
This is an inevitable state of affairs if we look closely at the assembling of syntactic trees. At each step 
of Merge, the object that is newly selected from the lexical (sub)array into a workspace by definition 
has not undergone any Merge step in that workspace (so it is a “minimal” category in a relative sense). 
While this is natural in the merger of a primitive category with an existing phrase, a question arises in 
the merger of a specifier or adjunct with a phrase: Where has the specifier/adjunct (which is also phrasal 
by definition) been derived? The only logical possibility is that it has been derived in another workspace 
before being selected and merged in the current workspace. The upshot is that “layered derivation” 
(Zwart’s term) must be a standard mechanism in minimalist syntax. We contend that this is also what 
happens in the derivation of noncanonical pronouns. 
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(31) a. Workspace I b. Workspace II c. Workspace III 
DPpro DPpro VoiceP 
/ \ / \ / \ 
D NumP DPpro V DPpro-v VoiceP 
/ \ / \ 
Num NP Voice VP 
(atomize DPpro) (support DPpro with a root) (merge the root-supported DP». as a specifier) 


(atomize the root-supported DPpro) 


The above derivation involves three consecutive layers, each defined by a workspace. In (31a), a default 
pronominal phrase is built up in Workspace I. We use the DP-NumP-NP structure as an example, but 
as mentioned in §4.1, any workable DP structure is fine for current purposes. In (31b), the assembled 
and atomized pronominal DP is selected into Workspace II and merged with a root. This is the 
categorization step. As a result, the root is assigned the category D and the D category is associated 
with the idiosyncratic content in the root. Finally, in (31c) the root-supported pronominal DP is selected 
into Workspace III and merged as a specifier. The particular scenario here is Spec-VoiceP; namely, the 
subject of a transitive verb. 

Second, the derivation in (31) does not hinge on the concrete content in the supporting root, since 
that is syntactically inert anyway. This means that miscellaneous lexical materials can be (re)used at the 
root slot, as long as it is properly lexicalized, with its original formal features (if any) reanalyzed as 
lexical features. As such, not only simple roots like VZHEN but also derived roots like VAI-JIA can be 
used to support DP». The key prediction here is that this recategorized dijia can only be used as a 
pronoun but not as a common R-expression anymore, even though its original meaning clearly is R- 
expressional (i.e., “mourner’”). As we observed in §2, this is generally true for noncanonical pronouns. 
Third, the tree in (31a) is just that of a default pronoun, which in principle can have its own overt form. 
For instance, if the pronoun is 1SG, then it in principle can be spelled out as the default wo in Chinese. 
Yet this never happens with noncanonical pronouns. That is, noncanonical pronouns do not allow 
appositive default pronouns, which sharply contrasts them with imposters, as in (32).76 


(32) a. Laoshi (wo) kudi yao shiqu wo de ndaixing le. [Mandarin, imposter] 
teacher.1SG 1SG quick going.to lose 1SG POSS patience CRS 
“Teacherisc (1) is going to lose my patience.” (adapted from Want 2014:185) 


b. Zhen (*wo) kudi yao shiquwo de ndixing le. [noncanonical pronoun] 
zhén.1SG 1SG quick going.to lose 1SG POSS patience CRS 
“Zhenisc (*I) is going to lose my patience.” 


Intuitively, zhén is just wo with an alternative pronunciation and some idiosyncratic effects, so saying 
zhén wo is like saying “IT” in English, which is clearly ungrammatical. How can the syntactic derivation 
in (31) bear this out, though? The observation, in conditional terms, is the following: 


(33) i. Ifa functional category is root-supported, it assumes the root’s exponent (possibly distorted). 
ii. If a functional category is non-root-supported, it assumes its default exponent (if any). 


There are two ways to explain (33). One way is to view it as a concomitant of the categorization 
procedure. Recall that the classic case of categorization in distributed morphology is that of content 
words. In this case the uncategorized root has no fixed pronunciation and only gets one when it is 
assigned a category. For instance, the English root VPERMIT may be pronounced as /po-' mit/ or /‘pa-,mit/ 
depending on its category. This is even more evident in languages like Hebrew, where uncategorized 
roots cannot be vocalized at all (e.g., VK-T-B ‘related to writing’, VL-M-D ‘related to learning’). The 


26 While imposters may co-occur with appositive default pronouns in Chinese, this is impossible in Vietnamese. 
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theory for this in distributed morphology is that each step of categorization in syntax corresponds to a 
step of retrieving stored phonological/semantic information at the interfaces. And crucially, it is the 
[categorizer root] unit as a whole, not its subparts, that gets assigned that information. As such, it is 
plausible that each noncanonical pronoun has its own lexical entry, and when a root-supported DPpro 
like (31b) is interpreted, it is assigned the stored pronunciation, which in most cases is the same as that 
of the recycled lexical material (again a manifestation of MMM) but may also involve certain distortion 
as in lunjid. 

Another way to explain (33) is to invoke Kiparsky’s (1973) elsewhere principle, which basically 
says that when a more general and a more specific rule are adjacent, the more specific rule applies. For 
instance, the past tense form of go is went instead of *goed because the irregular rule is more specific. 
Neeleman and Szendréi (2007) use the elsewhere principle to explain radical pro drop. We leave out 
the details of their application for space limitations but merely cite a well-known implication of the 
principle that they list: 


(34) All else being equal, the phonological realization of syntactic structures favors spell-out of a 
category C over spell-out of the categories contained in C. (Neeleman and Szendroéi 2007:685) 


This immediately explains why a root-supported functional category gets pronounced differently from 
a non-root-supported one—because in a tree like (31b) the spell-out rule that targets [DPpro \] as a whole 
blocks the rule that targets DP pro itself. We can reformulate (33) as follows to better reflect the elsewhere 
principle. 


(35) 1. Ifa functional category is root-supported, it assumes the root’s exponent (possibly distorted). 
ii. Elsewhere it assumes its default exponent (if any). 


In any event, the different pronunciations of default pronouns and noncanonical pronouns, in spite of 
their partially identical underlying structures, well conform to independently motivated rules in 
generative syntax. 


4.4 Crosslinguistic availability 

We mentioned in §1 that noncanonical pronouns have a much more limited crosslinguistic distribution 
than imposters. Our foregoing analysis may explain why this should be the case. Song (2019:136—137) 
points out that root-supported heads are “analytic heads” because they have a very low 
category/morpheme-per-word ratio (respectively 1:1 and 2:1), where a word is understood as a 
morphophonologically freestanding unit. And Chinese-style semifunctional items (like those in (30)) 
are furthermore “analytic heads par excellence” because their category-per-word ratio and morpheme- 
per-word ratio are both 1:1, where a morpheme is understood in the traditional sense as a minimal 
sound-meaning pair. As such, by analyzing noncanonical pronouns in generalized root syntax, we 
automatically get the following prediction: 


(36) Noncanonical pronouns are more common in highly analytic languages. 


This may explain why we can easily find noncanonical pronouns in Vietnamese and Chinese but not in 
familiar European languages—both Vietnamese and Chinese are highly analytic languages, which have 
the right grammatical setting for root-supported heads to stably exist. Note that we are not claiming that 
high analyticity is solely defined by root support. In fact, if there is a high analyticity parameter at all, 
that is very likely to be codefined by a cluster of smaller parameters or grammatical settings (see Huang 
2015 for a discussion). Here we are merely stating that a highly analytic language has a root-support- 
friendly setting. For instance, among others it may have scarce head movement, which means it has 
more standalone words than affixes, and root support just provides a convenient way to create 
standalone grammatical words. Of course, we need to check more highly analytic languages to verify 
(36), which we leave to future research. 
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5 Conclusion 

In this paper, we studied a new type of pronominal item emerging on the Internet in Vietnamese and 
Chinese. First, we demonstrated that pronominal items of this new type, which we call noncanonical 
pronouns, are a separate category from both textbook default pronouns and imposters ($1). Then, we 
illustrated their real-life usage in detail (§2). Our investigation showed that noncanonical pronouns in 
the two Asian languages are similar not only in syntactic behavior but also in lexical sources and 
evolution pathways. Syntactically, both Vietnamese and Chinese noncanonical pronouns behave like 
default pronouns except that they have various extragrammatical effects, mostly reflecting the speaker’s 
mind-set or personality. Lexically, both languages subsume three subtypes of noncanonical pronouns 
based on their evolution pathways: revived ancient terms, dialectal terms, and creative online coinages. 
After presenting the empirical facts, we briefly discussed the taxonomy of pronominal items and 
specified why we prefer a syntactically based taxonomy (§3). 

In the theoretical part of the paper, we analyzed noncanonical pronouns in the theory of generalized 
root syntax (§4), which is an extension of the root theory in distributed morphology (a branch of the 
minimalist program). Specifically, we analyzed noncanonical pronouns in the schema [DP pro \V], where 
DP, is a separately derived and atomized default pronoun structure and V is a purely lexical root 
supporting that structure. The root part may be constituted by terms (re)lexicalized from the three 
sources above. After the (re)categorization, the root-supported structure DPpro-y is selected into the main 
workspace and merged onto the main tree, where it behaves like a default pronoun in syntax but triggers 
idiosyncratic phonological and semantic properties at the interfaces, just as we have observed in 
Vietnamese and Chinese. Alongside our analysis, we clarified a number of theoretical and technical 
issues, such as how to use previous theories of pronominal syntax in a well-founded way, why cross- 
workspace or layered derivation must be allowed in generative syntax, and why default and 
noncanonical pronouns with the same pronominal structure can have totally different overt forms. 
Finally, we also tentatively explained why noncanonical pronouns have limited crosslinguistic 
availability in terms of the correlation between root support and high analyticity. 

Due to limited scope, we have had to leave some interesting questions to future research, including 
but not limited to the acquisitional order predicted by our taxonomy in §3 and the availability of 
noncanonical pronouns in other highly analytic languages as predicted in §4. We also observed a quasi- 
complementary contrast between Vietnamese and Chinese in §2 concerning the distribution of 
noncanonical pronouns and imposters. So, as a future plan we would also like to further compare 
Vietnamese and Chinese imposters and look into questions such as why imposters are used more freely 
in Vietnamese (footnote 4), why they may co-occur with appositive default pronouns in Chinese but 
not in Vietnamese (footnote 26), and so on. 


Abbreviations 

1/2/3 = first/second/third-person 
AFFECT = affectionate 

ARG = argument 

CLF = classifier 

COP = copula 

CRS = currently relevant state 
DEM = demonstrative 

DET = determiner 

DIM = diminutive 

DISP = disposal 

DM = discourse marker 

EMP = emphatic 

EXCL = exclamative 

EXP = experiential 

FEM = feminine 

FUT = future 

HON = honorific 
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IMPERF = imperfective 
INTS = intensifier 
MASC = masculine 

N = noun 

NEG = negation 

NMLZ = nominalizer 
Num = number 

OFF = offensive 

PL = plural 

POSS = possessive 

PRF = perfective 

PROG = progressive 
PRT = particle 

Q = question marker 
REL = relative clause marker 
RES = resultative 

SG = singular 

TEL = telic marker 
TRANS = transitive 
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Abstract 

In this study, I address the grammatical nature of Sino- Vietnamese words borrowed from 
Chinese verb-object compounds (or /ihéci in Chinese). In Chinese, this type of compound 
has several idiosyncratic characteristics, and previous studies of Chinese grammar suggest 
that they have dual status as words and phrases. In Vietnamese, similar to other lexical 
items, a number of verb-object compounds have been borrowed from Chinese. I conducted 
two experiments to investigate whether the Sino-Vietnamese “verb-object compounds” 
retain separability and object restriction. The results show that unlike the original forms, 
they cannot be separated by other morphemes, and they sometimes became transitive verbs 
during or after the borrowing, which are different from the original intransitive forms in 
Chinese. These findings clearly indicate that the Sino-Vietnamese “verb-object 
compounds” have almost completely lost their phrasal status attested to in their original 
forms and only retain the word status. 


Keywords: Vietnamese, Chinese, verb-object compounds, grammatical borrowing 
ISO 639-3 codes: vie, cmn, yue, jpn 


1 Introduction 

It is well known that Vietnamese, belonging to the Vietic branch of the Austroasiatic family, has largely 
been influenced by Chinese through long-term contact between the two languages. Previous studies 
such as Alves (2017) and Shimizu (2017) demonstrate that the impact of Chinese is remarkable at the 
phonological and lexical levels of Vietnamese, and many studies on Sino-Vietnamese words (i.e., 
loanwords from Chinese) have paid attention to their phonological and/or lexicosemantic characteristics 
for decades. In contrast, a smaller number of studies have focused on the grammatical (i.e., morpho- 
syntactic) nature of Sino-Vietnamese words because it is believed that the grammatical influence of 
Chinese is more subtle than its phonological and lexicosemantic effects. In exploring lexical borrowing 
from Chinese to Vietnamese in the early and pre-modern eras (i.e., spoken borrowing during the first 
millennium CE and borrowing through Chinese writing after the Tang dynasty), Alves (2007a:343) 
argued that many of the grammatical characteristics typical of varieties of Chinese are not part of 
Vietnamese grammar, and that the grammatical influence of Chinese is primarily lexical rather than 
structural. In contrast, Alves (2007b) suggested that a number of Sino- Vietnamese words acquired new 
grammatical meaning or functions after borrowing, which means that internal grammaticalization of 
them occurred in Vietnamese. Along these lines of research, Washizawa (2019) investigated the internal 
grammaticalization processes of some Sino-Vietnamese words from the 16th to the 19th centuries. 

As for the grammatical borrowing from Chinese into Vietnamese, I have noticed that few studies 
have examined the synchronic nature of Sino-Vietnamese words and compared it with that of the 
original forms in Chinese. Therefore, for this study, I focused on the grammatical characteristics of 
Sino-Vietnamese “verb-object compounds” (VO compounds) which belong to ‘Sino-neologisms’ 
largely borrowed at the beginning of the 20th century for translating modern Western concepts (Vinh 
1993, Alves 2007a, 2017). According to reference grammars of Chinese (Chao 1968, and Li and 
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Thompson 1981, etc.), the original VO compounds have several idiosyncratic features; thus, they are 
suitable for considering the issues related to the nature of grammatical borrowing from Chinese into 
Vietnamese. 


2 VO compounds in Chinese 

In Chinese, there is a special lexical compound class called “VO compounds” (or lihéci Ail). Li 
and Thompson (1981:73) suggested that they consist of two constituents having the syntactic relation 
of a verb and its direct object, but which are different from verb-object phrases in the following aspects. 
First, VO compounds consist of one or two bound morphemes. For example, the second morphemes of 
shang-shi |. 1 (ascend-city) ‘come to market’ and /i-fa #2 (arrange-hair) ‘have a haircut’ are bound 
forms because they are not normally used alone in modern Chinese and must be combined with another 
morph, such as chéng-shi Ykrfi (city-city) ‘city’ and tou-fa 4K (head-hair) ‘hair’. These facts signal 
that these VO expressions are not verb phrases, but compounds. 

Second, these Chinese-style VO compounds have the idiomaticity of the meaning of the entire unit 
(Li and Thompson 1981:73). In other words, their entire meaning is not derived from that of the 
constituents. For example, shdng-shi |i can be used to suggest that some fruits or vegetables are 
sold in the market because they are in season, which is not predictable from shdng ‘ascend’ and shi 
‘city’. Similarly, /i-fa FLX refers to cutting one’s hair, which is slightly different from the combined 
meaning of the morphemes /i ‘arrange’ and fa ‘hair’. 

Third, and most significant to this study, even though they have a compound-like nature, as stated 
above, the vast majority of VO compounds allow their constituents to be separated by other morphemes 
(Li and Thompson 1981:73). In shang-shi [.1i and Ji-fa #24 examples, they can be separated by 
various morphemes, such as the perfective marker /e J and the experiential marker guo i, as depicted 
in (1a) and (1b), respectively. 


(1) Separation of VO compounds in Chinese 
a. Separation by the perfect aspect marker Je [ (Packard 2016:76) 


ate J 1H 
shang le shi 
ascend PERF city 


‘came to market’ 


b. Separation by the experiential aspect marker guo if (Li and Thompson 1981:75) 


aa / fib za tba Hf iw Ke 
Ta hai méi li guo fa 
3SG still not arrange EXP hair 


‘S/He still hasn’t ever had a haircut’ 


Previous studies of Chinese grammar have paid much attention to this nature and discussed the 
grammatical status of VO compounds for decades (Lu 1957, Chao 1968, Lii 1979, Packard 2016, etc.). 
Packard (2016:76) summarized the current consensus very nicely: “Li2he2ci2 are best viewed as a kind 
of word (i] ci2— i.e., a morphologically complex verb) with one or two bound constituents, but a word 
whose constituents are subject to syntactic reanalysis as free elements in certain limited contexts”. This 
clearly shows that VO compounds have dual status as words and phrases according to the situations in 
which they occur. 

Li and Thompson (1981:76) pointed out another important feature of VO compounds in Chinese: 
the vast majority of VO compounds do not take a direct object after these VO lexical compounds. For 
example, in (2), fen-léi ‘classify’ (2a) cannot be followed by a direct object like déngwiu ‘animals’ (2b), 
the latter of which has to appear as an object of the preceding prepositional phrase headed by géi 24 
‘to’, as portrayed in (2c). 
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(2) Object restriction in the VO compound fén-léi 7} 28 
a. 32 
fon tei 
divide-class 
‘classify’ 


b. *4}28 ayy 
*fen-léi dongwu 
divide-class animal 
‘classify animals’ 


Cc. a aR 
géi dongwu___fen-léi 
to animal classify 


‘classify animals’ 


Some previous studies have found that not all Chinese VO compounds are subject to this constraint. 
For example, Maruo and Han (2018) demonstrated that some words, such as liti-xué f= ‘study abroad’ 
and chi-xi (4 }ii ‘participate in,’ are allowed to take an object after them. 

VO compounds are found not only in Mandarin, but also in other Chinese dialects as well. For 
example, Cheung (1972/2007) and Matthews and Yip (1994/2011) described VO compounds in 
Cantonese, which have grammatical features similar to those of Mandarin. In (3a), a VO compound 
duhk-syit is ‘study’ can be separated by the progressive aspectual marker gdn ‘A. In (3b), an object- 
like word ngoh 4X ‘first person singular pronoun’ cannot occur after the VO compounds béng-mdhng 
##1C- ‘help’; instead, the pronoun must be used as the possessor of the second morpheme mohng ‘busy’. 


(3) |VOcompounds in Cantonese (Cheung 2007:91; Matthews and Yip 2011:58-59) 


a. i Fi ix a Hi 
duhk-syut duhk gan syul 
study-book study PROG _ book 
‘study’ ‘(be) studying’ 

b. # t ee ee ee 
béng-mohng bong ngoh ge mohng 
help-busy help 1SG LP busy 
‘help’ ‘help me’ 


3 Sino-Vietnamese “VO compounds” 

In additional to many other lexical items, a large number of VO compounds have been borrowed from 
Chinese to Vietnamese. Although the grammatical nature of Sino-Vietnamese “VO compounds” has 
not been well studied, several studies in Chinese as Second Language (CSL) partially refer to this issue. 
For example, Ng6 (2007) showed that Vietnamese-translated words from Chinese VO compounds 
cannot be separated by other morphemes in general, and that their transitivity is not identical to that of 
VO compounds in Chinese. Nguyén (2019) also investigated the transitivity of Vietnamese verbs having 
the same meaning as Chinese VO compounds, and suggested that some of them are transitive verbs, 
which leads to grammatical errors when Vietnamese learners use VO compounds in Chinese. These 
CSL studies provide important insights into the grammatical nature of Sino-Vietnamese “VO 
compounds”. However, we cannot identify the systematic characteristics of the compounds since both 
Ng6 (2007) and Nguyén (2019) analyzed mixed data, including both Sino-Vietnamese and Vietic words. 
It is necessary to focus only on Sino-Vietnamese words to grasp the nature of grammatical borrowing 
from Chinese to Vietnamese. Therefore, in this study, I only deal with Sino-Vietnamese “VO 
compounds”, which were largely borrowed in the early 20th century, and compare their grammatical 
aspects to those of the original forms in Chinese. 
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4 Methods and results 

I conducted two types of experiments: separation and object tests. The former explores whether the 
constituents of Sino-Vietnamese “VO compounds” can be separated by other function words and the 
latter investigates whether they can take an object word behind them. 


4.1. The separation test: Can Sino- Vietnamese “VO compounds” be separated by other elements? 
Before conducting the experiments, I selected the target words in the following manner. First, I collected 
268 VO compounds in Chinese from a /ihéci dictionary by Zhou (2011) and determined that 91 
compounds were listed as Sino-Vietnamese words in Kawamoto (2011), which is the largest 
Vietnamese-Japanese dictionary with Chinese characters. I then showed these 91 words to a female 
native speaker of Vietnamese (hereafter, $1) and asked her to comment on her familiarity with each 
word. She regarded 47 words as “familiar and commonly used” for her; thus, I selected them as the 
targets of the experiments, which are shown in Appendix 1. 

The female speaker S1 participated in the separation and object tests. She was born in 1985 and 
had lived in Hanoi since childhood except for six years in Japan. I conducted the experiments in Hanoi 
in August of 2019. 

The separation test allowed us to determine whether the constituents of the target words could be 
separated by three function words: duoc ‘fortunately, successfully, with a good result’, /ai ‘repeating, 
doing over’, and hét ‘be completely finished, all used up, all gone, etc.’, which can be inserted between 
a verb and an object in verb phrases (Thompson 1965), as displayed in (4), 


(4) Insertion of function words duoc, lai, and hét between a verb and an object (Thompson 1965:268, 
345, 347) 
a. duoc : fortunately, successfully, with a good result 
Ong ay mua dugc mot ngoi nha to. 
he buy GR one CL house big 
‘He purchased a large villa.’ 


b. Jai : repeating, doing over 
Luc ban toi ra HaNoi, toi Oo lai Sai Gon lam 
when friend I go Hanoi I in(verb) again Saigon do 
‘At the time my friend went to Hanoi, I remained in Saigon working.’ 


c. hét : be completely finished, all used up, all gone; no longer; completely, to the very end 
Anh ay tiéu hét ca tién roi. 
he consume finish all money PERF 
‘He has spent all the money already’ 


I asked S1 whether it was possible to insert the three function words between the first and second 
constituents of each target word. The results indicated that no target words could be separated by them, 
with one exception: phdt duoc dién ‘to be able to generate electricity’ from phdt dién (generate- 
electricity) ‘to generate electricity’. S1 created an example sentence, as in (5). 


(5) Cai may nay phat dugc dién. 
CL machine this generate GR __ electricity 
‘This machine can generate electricity.’ 


It is noted that, in Vietnamese, the constituents phdt and dién are full words rather than bound 


morphemes, as suggested to me by Mark Alves (p.c.) and Sho Yamaoka (p.c.). However, S1 said other 
two function words /qi and hét were not allowed to be inserted in the same position. 
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4.2. The object test: Can Sino- Vietnamese “VO compounds” take an object? 
After the insertion test, I conducted an object test with S1 simultaneously. In this experiment, I asked 
her whether each target word could take an object behind it. To test this sense of grammaticality, I often 
formed example phrases or sentences in which target words were followed by related object words, and 
S1 judged their grammaticality. If the expressions were grammatical to her, I asked her to create another 
example phrase or sentence with an object. After the experiment, I found that S1’s answers were 
sometimes inconsistent and seemed to differ slightly from the predictions inferred from reference 
grammars and dictionaries of Vietnamese. For this reason, I thought it necessary to collect additional 
data from another native Vietnamese speaker to reach a valid conclusion. However, I could not conduct 
the full experiments due to the Covid-19 pandemic as of early 2020; thus, I only performed a partial 
test with the second native speaker of Vietnamese (hereafter, S2) in January of 2021 via Zoom. The 
second speaker, S2, was born in 1993 and lived in Vung Tau in Southern Vietnam for 18 years, and 
then studied at a university in Ho Chi Minh City for four years. After graduating from the university, 
she stayed in Japan as a graduate student; at the time of the experiment, she was still studying at a 
graduate school in Japan. 

Before conducting the second experiment with S2, I presented her with 47 original target words 
(see Appendix 1) and asked her to comment on her familiarity with each word. She regarded 32 words 
as “familiar and commonly used” for her; therefore, I selected them as the new target words for the 
object test. In the experiment, S2 judged whether each target word could take an object in the same 
manner as in the experiment with S1. 

The combined results collected from S1 and S2 can be classified into five groups. First, S1 and S2 
agreed that seven words could take an object, as seen in Table 1. 


Table 1: Group I (target words for which the speakers accepted object noun phrases) 


Word Meaning Example 
(Words in bold are object words) 
bao mat keep-secret Bao matthéng tin 
‘keep a secret’ ‘keep information’ 
chiéu sinh invite-student Chiéu sinh cac lép hoc 
‘recruit students’ ‘recruit students for each class’ 
gia cong add-work Gia cong may moéc 
“process’ (verb) “process a machine’ 
phat bénh generate-disease Phat bénh tim 
“become ill’ ‘get heart disease’ 
phan cong divide-work Phan cong céng viéc nay 
‘divide the work’ ‘divide this task’ 
phan loai divide-kind Phan loai d6ng vat 
‘classify’ ‘classify animals’ 
tot nghiép* finish-job Tot nghiép trwéng dai hoc 
‘graduate’ ‘graduate from university’ 


According to Zhou (2011:107), in Chinese, only jia-gong Jill. (add-work) ‘to process’ corresponding 
with the Sino-Vietnamese gia céng can take an object, while the remaining six VO compounds cannot. 
These results imply that some Sino-Vietnamese “VO compounds” changed their transitivity during or 
after the borrowing. 

The second group included seven words where both native speakers made different judgments: S1 
argued that all of them could take an object, while S2 believed that none of them could. I searched for 
additional information about the words from Vietnamese dictionaries and the SEAlang Library 
Vietnamese Text Corpus, and found that at least four of them (i.e., diém danh ‘call the roll’, nhap hoc 


2 Tot nghiép is a common Sino-Vietnamese word borrowed from zi-yé 2\lV. (finish-job) ‘to graduate’. This 


word is not included in Zhou (2011); instead, bi-yé EE\ll: (finish-job) ‘to graduate’ is listed there. However, in 
this study, we adopted tot nghiép as a target word because zi 4 and bi #6 have the same meaning ‘to finish’. 
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‘enter a school’, thi céng ‘build’, and tw chic ‘resign’) have examples with an object, as presented in 
Table 2. 


Table 2: Group 2 (target words for which the speakers had different judgements; SEA: SEAlang 
Library Vietnamese Text Corpus) 


Word Meaning Example from dictionaries and a corpus 
(Words in bold are object words) 
bao danh report-name 
‘enroll’ 
bé mac close-curtain 
‘close (a ceremony, 
etc.)’ 
diém danh call-name Diém danh hoe sinh (Chu et al. 2015) 
‘call the roll’ ‘call the students’ 
khai mac open-curtain 
“open (a ceremony, 
etc.)’ 
nhap hoc enter-study khi t6i nhap hoc dai hoc ... (SEA) 
‘enter a school’ ‘When I entered university...’ 
thi cong carry out-work Thi cong khu nha @ cao tang (Nguyén and Phi 2013) 
‘build’ ‘build a high-rise housing area’ 
tue chirc resign-job Tw chirc hiéu trwéng (Nguyén and Phi 2013) 
‘resign’ ‘resign as principal’ 


The third group includes two words that can be followed by a verb phrase, rather than an object noun 
phrase. The two native speakers agreed that pham tdi can be followed by a verb phrase involving a 
criminal behavior such as buén bdn ma tuy ‘to sell drugs’, and tuyén thé can take a verb phrase 
representing a detailed content of the declaration. In the examples in Table 3, the verb phrases seem to 
be complementary clauses, rather than a part of serial verb constructions. 


Table 3: Group 3 (target words for which the speakers accepted verb phrase adjuncts) 


Word Meaning Example 
(Words in bold represent verb phrases) 
pham toi violate-crime Pham toi buon ban ma tuy. 
‘commit a crime’ ‘commit drug sales’ 
tuyén thé declare-notify Tuyén thé sé lam gi do. 
‘declare’ ‘declare that (I/you/he/she) will do something’ 


The fourth group included five words that could be followed by a prepositional phrase. In Table 4, 
prepositions vé ‘about, on’, dén ‘about, on, over’, and vi ‘to, together with, against’ (Nguyén 1997: 
162) are used between the target verbs and their related nouns. This group has shared grammatical 
characteristics with the corresponding VO compounds in Chinese: As noted in (2), VO compounds in 
Chinese generally use a prepositional phrase in place of an object noun phrase. 
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Table 4: Group 4 (target words for which the speakers accepted prepositional phrase adjuncts) 


Word Meaning Example 
(Words in bold represent prepositional phrases) 
an tam be satisfied-heart an tam vé twong lai 
‘be relieved’ ‘be relieved about the future’ 
hanh quan go-army hanh quan dén |tén dia diém| 
‘(troops) march’ ‘march to [a place name]’ 
két hon connect-marriage két hon voi anh dy 
‘marry’ “marry him’ 
ly hon leave-marriage ly hon voi Nam 
‘divorce’ ‘divorce Nam’ 
nhap canh enter-border nhap canh vao Viet Nam 
‘enter into a country’ ‘enter into Vietnam’ 


Finally, the research showed that 11 words that cannot be followed by any related elements, such as 
object nouns, verb phrases, or prepositional phrases. The intransitive nature of these 11 words is 
consistent with the corresponding VO compounds in Chinese. 


Table 5: Group 5 (target words for which the speakers accepted no adjuncts) 


Word Meaning Word Meaning 
bai cong cease-work sinh bénh get-disease 
‘go on strike’ ‘get ill’ 
bién chat change-quality that hoc lose-study 
‘go bad’ ‘be deprived of 
education’ 
nhdp vién? enter-hospital that nghiép lose-job 
‘be in the hospital’ ‘be unemployed’ 
pha san break-property truc ban be on duty-shift 
‘go bankrupt’ “be on duty’ 
pham phap violate-law xuat canh go out-border 
‘break the law’ ‘leave the country’ 
phat dién generate-electricity 
‘generate electricity’ 


Table 6 portrays the overall results of the object test: the members of Group 2 are excluded because we 
need to further investigate their transitivity. In Table 6, although a majority of the target words remain 
intransitive, at least eight Sino-Vietnamese words (i.e., six words in Group 1, except for gia céng ‘to 
process’, and two words in Group 2) are different from their original Chinese forms in transitivity. 


Table 6: Summary of the object test 


May take object noun | May take verb phrase | May take prepositional May not take any 
phrases adjuncts phrase adjuncts adjuncts 
7 2 5 11 


Different from Chinese In common with Chinese 


(except for gia c6ng ‘to process’) 


5 Discussion and conclusion 
In this study, I conducted two experiments to investigate the separability and transitivity of Sino- 
Vietnamese “VO compounds”. In this section, I discuss their grammatical features and compare them 


with those of the original VO compounds in Chinese, 


3 In the experiments, I adopted nhap vién borrowed from rii-yuan A [i (enter-hospital) as a target word, instead 
of zha-yuan {4(5¢ (live-hospital) listed in Zhou (2011), because nhdp vién is a common word in Vietnamese. 
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In the separation test, almost none of the target words could be separated by functional morphemes, 
which clearly shows that they have almost all lost their phrasal status attested to in the VO compounds 
in Chinese. As observed in Section 2, the dual status of Chinese VO compounds is one of the 
idiosyncratic characteristics in the grammatical system of the language. However, their borrowed forms 
in Vietnamese only have status as compound-like words. 

Meanwhile, in the object test, a number of target words have different transitivity from their 
original forms in Chinese. As demonstrated by Li and Thompson (1981), in Chinese, the majority of 
VO compounds behave like intransitive verbs and cannot take an object. In contrast, a number of target 
Sino-Vietnamese words behave like transitive verbs, which means they changed their transitivity in the 
process or following being borrowed. Although it is not clear what factors determine the transitivity of 
each Sino-Vietnamese “VO compound”, it may be the case that some cross-linguistic semantic factors 
play an important role because, as shown in (6), the transitivity of Sino-Vietnamese “VO compounds” 
is basically the same as that of Sino-Japanese ones. In (6a), the Sino-Japanese bunrui, borrowed from 
fen-léi 7 ‘classify’, can take an object noun phrase (i.e., noun + the accusative particle 0) and, in 
(6b), hasan, borrowed from po-chan Wir" ‘bankrupt’, is only available as intransitive. These Sino- 
Japanese words are entirely consistent with the corresponding Sino-Vietnamese words phdn loai 
‘classify’ and phd san ‘bankrupt’ in transitivity. We will evaluate the importance of this semantic factor 
in the future. 


(6) The transitivity of Sino-Japanese “VO compounds” bunrui and hasan 


a.  dobutsu -0 bunrui -Suru 
animal ACC classify do 
‘classify animals’ 

b. — kono- kaisha -wa hasan -shita 
this company TOP bankrupt did 
‘This company went bankrupt.’ 


Finally, let us consider the historical process of borrowing VO compounds. It is worth noting that the 
majority of target words in my experiments belong to ‘Sino-neologisms’ borrowed into Vietnamese at 
the beginning of the 20th century. Referring to Vinh (1993), Alves (2007a, 2017) pointed out that the 
borrowed items in this period contained a mixture from both Japan and China because the Japanese had 
been translating Western concepts by utilizing Chinese lexical material in the late 1800s and early 1900s. 
In this study, it is necessary to confirm whether the target words were borrowed from China or Japan 
since several target words have the same characteristics as the corresponding Sino-Japanese “VO 
compounds” in separability and transitivity. According to the word list RS ALAR ei] 
#é (Luo 2018), including disyllabic Sino-Vietnamese words originating from countries other than China, 
only thi céng ‘build’ was created outside of China (i.e., from Japan) in my target words. Chen (2019) 
showed that in his database—which includes 1,028 common verbs in Chinese—only 47 (4.57%) were 
borrowed from Japanese, and that the borrowing rate of verbs was much lower than that of nouns 
(11.39%) and adjectives (7.7%). Based on the findings of the previous studies, it may be the case that 
verbs created in the Japanese language were not the main resources of Sino-neologisms, and that the 
Vietnamese borrowed very few verbs directly from Japanese at the beginning of the 20th century. We 
still have to investigate the borrowing route of Sino-neologisms in more detail; however, at this point, 
there is no clear evidence of influence from Japanese, at least in the target words of my experiments. 
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Appendix 1: Target words of the separation test 


http://sealang.net/vietnamese/corpus.htm 


Word in Meaning of Meaning of Chinese 
Quoc ngir 1st morpheme 2nd morpheme the entire word characters 
an tdm be satisfied heart ‘be relieved’ ZENE) 
bai cong cease work ‘go on strike’ RT. 
bao danh report name ‘enroll’ IRA 
bao mat keep secret ‘keep a secret’ [Re BE 
bé mac close curtain ‘close (a ceremony, etc.)’ ibs 
bién chat change quality ‘go bad’ AR Mi 
bién hinh change shape ‘be out of shape’ ARTZ 
cdo trang tell complaint ‘charge, indictment’ (noun) an 
chiéu sinh invite student ‘recruit students’ FRE 
dung cong use work ‘try hard’ FA 
diém danh call name ‘call the roll’ me 
dinh hén agree on marriage ‘be engaged to’ Ani 
gia cong add work “process’ (verb) mt 
hanh quan go army ‘(troops) march’ Tze 
két hon connect marriage ‘marry’ Zh 
khai khau open mouth ‘open one’s mouth and say Fe 
something’ 
khai mac open curtain ‘open (a ceremony, etc.)’ Fee 
16 dién show face ‘show up’ = [Al 
luyén binh practice solder ‘train troops’ oy Fe 
luu tam keep heart ‘take care’ fay 
luu y keep intention “pay attention to’ Am 
ly hon leave marriage ‘divorce’ cau 
ménh danh order name ‘name (a baby etc.)’ fin Y 
nhap canh enter border ‘enter into a country’ MS 
nhap hoc enter study ‘enter a school’ AE 
nhap vién enter hospital “be in the hospital’ A be 
nhuong b6é give way step ‘make concessions’ LE 
pha san break property ‘go bankrupt’ Were 
pham phap violate law ‘break the law’ JBYE 
pham toi violate crime ‘commit a crime’ JU SE 
phat bénh generate disease ‘become ill’ RAR 
phat dién generate electricity ‘generate electricity’ FRB, 
phat hoa generate fire ‘catch fire; be angry’ BK 
phat tai generate wealth ‘get rich’ BMY 
phan cong divide work ‘divide the work’ ay. 
phan loai divide kind ‘classify’ Ay2s 
sinh bénh get disease ‘get ill’ AE 
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tao phan make rebellion ‘rebel’ i 
that hoc lose study ‘be deprived of education’ Hee 
that nghiép lose job ‘be unemployed’ lb 
thi cong carry out work ‘build’ Titi 
thuong tam hurt heart ‘be sorrowful’ tty 
tot nghiép finish job ‘graduate’ RAY 
truc ban be on duty shift ‘be on duty’ (BE 
tuyén thé declare notify ‘declare’ =e 
tir chitc resign job ‘resign’ mye AA 
xudat canh go out border ‘leave the country’ LH tee 
Abbreviations 


ACC = accusative 

CL = classifier 

EXP = experiential 

GR = good result 

LP = linking particle 
PERF = perfective 
PROG = progressive 
SG = singular 

TOP = topic 

1, 3 = first, third person 
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Abstract 

This paper provides a phonological description of Bagobo-Klata, a Southern Philippine 
language spoken in the eastern slopes of Mt. Apo, Davao City, the Philippines. Based on 
a 1,026-word list, the description is mostly concerned with the synchronic phonology of 
Bagobo-Klata but notes selected diachronic sound changes. The synchronic part discusses 
its phonemic inventory, syllable structure, segmental stress, phonotactics, and 
phonological processes, while the diachronic part lists and explains salient Bagobo-Klata 
reflexes of Proto-Malayo-Polynesian (hereafter, PMP) phonemes. Among the salient 
phonological features of Bagobo-Klata are its five-vowel system /a, €, 9, 1, u/, consonant 
clusters, geminates, several phonological processes triggered by affixation, and the 
historically deleted word-final *n (e.g., PMP *bulan > bula ‘moon, month’, PMP *dahun 
> daru leaf, PMP *ipen > Pippa tooth, PMP *quzan > Pula ‘rain’, and PMP *zalan > dala 
‘abdomen, path’. 


Keywords: Bagobo-Klata, Philippine languages, phonology 
ISO 639-3 codes: bgi 


1 Introduction 

This paper provides an account of the phonology of Bagobo-Klata, a Southern Philippine language 
largely spoken in the eastern slopes of Mt. Apo, Davao City, Davao del Sur, the Philippines, or in 
approximately thirty-six barangays—including Barangay Don Panaca in Magpet, Cotabato—scattered 
in Baguio, Calinan, and Tugbok districts. The speakers of this language call themselves Bagobo-Klata, 
the endonym, or just Klata, but they seem to be more commonly known as Giangan. This is quite evident 
in the literature of Philippine linguistics but has been rectified in the 24th edition of the Ethnologue, in 
which Bagobo-Klata replaced Giangan as the language name in its language profile (Blust 1991 & 2019; 
Eberhard, Simons & Fennig 2021; McFarland 1994; Walton 1979; Zorc 1986). Bagobo-Klata speakers 
can be generally referred to as Bagobo, the collective term for the three ethnolinguistic groups (i.e., Obo 
Monuvu and (Bagobo-)Tagabawa) living in Mount Apo. 


1.1 Previous Research 

There are two phonological accounts on Bagobo-Klata: Cagas (1991) and Evans (2017), both of which 
describe the basic features of its phonological system, namely, the sound inventory, syllable structure, 
gemination, consonant clusters, and segmental stress. Between the two accounts, Evans (2017) is more 
thorough, specifically in terms of the treatment of allophony, gemination, and consonant clusters but 
has plenty of generalizations that are unsupported by examples and must be re-examined because of 
their implications for the overall phonological analysis of Bagobo-Klata. 

First, the voiced alveodental tap /r/ is treated as phonemic, despite researchers’ not providing a 
minimal pair. The only supporting claim is that /r/ is in unconditioned free variation with /d/ in some 
roots such as /horo?/ ‘to stop’, but [hodo?] is not recorded. On the contrary, Cagas (1991) states that /r/ 
is a variant of /d/ in intervocalic positions. Second, how the allophones of Bagobo-Klata consonants 
and vowels are analyzed is problematic: the environments that condition these allophones are not 
specified. For instance, gemination is claimed to be both phonemic and phonetic without evidence. In 
addition to that, vowel length is also shown to be phonetic without the conditioning environment. Third, 
the glottal stop occurring as an onset is claimed to be epenthetic, implying that V is the basic syllable 
structure in Bagobo-Klata. To support such a claim, acoustic data could have been provided. 
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While there exist general descriptions of Bagobo-Klata phonology, a new one that re-examines such 
generalizations in these studies and provides a fresh analysis of Bagobo-Klata phonology is warranted. 


1.2 Data 

The data used in this paper were gathered during the linguistic fieldwork embarked on by the author in 
Barangay Sirib, Calinan District, Davao City from the 10th to the 27th of November 2019 and during a 
series of virtual informant works from March to June 2021. Both data-gathering activities were done 
with permission from National Commission of Indigenous Peoples (NCIP) in Davao Region. Moreover, 
the data include a 1,026-word list recorded by the datu or chieftain of Barangay Sirib on the 27th of 
November 2019 and fully transcribed in January 2020; and a separate list of nominal and verbal 
derivations was elicited from two female language consultants via Facebook Messenger from March to 
June 2021. 


2 Consonants 

Bagobo-Klata has sixteen consonants, as shown in Table 1. These consonants contrast in places and 
manners of articulation and voicing. It must be noted that in this paper, [r] is treated as allophonic, even 
though there are instances in which [r] occurs in Bagobo-Klata roots such as /ki.'rop/ ‘to blink one’s 
eye’ and /ho.'ro?/ ‘to stop’. 


Table 1: Bagobo-Klata consonants 


Bilabial Alveodental Palatal Velar Glottal 


Stop pb td kg 2 
Nasal m n y 
Fricative s h 
Tap [rc] 
Lateral 1 
Approximant j Ww 


Shown in Table 2 are the minimal pairs of the fifteen contrastive consonants in Bagobo-Klata. 


Table 2: Bagobo-Klata segments exemplified in context 


Word-Initial Word-Medial Word-Final 
/p/ _ [pe.'ta?] ‘wet’ ['?i:.pit] ‘to sleep’ ['?i:.hip] ‘to whisper’ 
[' Pe:.ta?] ‘thigh’ ['?Pi:.nit] ‘hot’ ['?i:.hi?] ‘to whet a blade’ 
/b/ — [‘ba:.tuk] ‘cough’ [2?o.' bow] ‘mouse, rat’ [Pol.'lob] ‘spring’ 
['pa:.tuk] ‘duck’ [29.'low] ‘fence’ [Pol.'lot] ‘between’ 
/t/ — [to.'li] ‘string’ ['?e:.ta?] ‘calf (one’s leg)’ [kok.' kot] ‘to bury’ 
[?9.'li] ‘to choose’ ['Pe:.ma?] ‘armpit’ [kok.'kop] ‘to embrace’ 
/d/ — ['da:.?u] ‘leaf’ [ ‘tu:.duk] ‘oar’ ['Pa:.tad] ‘raft’ 
['ba:.?u] ‘smell, general’ ['tu:.kuk] ‘lazy’ ['Pa:.ta?] ‘rice husk’ 
/k/_ — [‘ke:.wo?] ‘stout’ [‘lu:.ka] ‘bowl’ [29.'lok] ‘to kiss’ 
[ le:.wo?] ‘narrow’ [‘lu:.wa] ‘ladle’ [29.'low] ‘fence’ 
/g/_ — [gol.'lot] ‘middle finger’ [' Pe:.gon] ‘feces’ [Pa.' Pog] ‘to take a bath’ 
[Pol.'lot] ‘between’ [' Pe:.kon] ‘tail’ [?a.' Pow] ‘to enter’ 
/2/ ~— [Pul.'Tu] ‘head’ [to.'?i] ‘to accompany’ [29.'to?] ‘nickname’ 
[hul.'lu] ‘to command’ [to.'li] ‘string’ [?o.'top] ‘roof’ 
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Word-Initial 
/m/ — ['ma:.li?] ‘good’ 
[pa.'li?] ‘wound’ 


/n/ _ [ne.'pes] ‘thin (object)’ 
[le. pes] ‘knife’ 


/y/ — [nil.‘lo] ‘ear’ 
[til.'lo] ‘flee (of a dog)’ 


/s/_— ['so:.a] ‘ankle’ 
[ ho:.?a] ‘thorn’ 


Ih/_— [ho.’bow] ‘milk’ 
[?9.' bow] ‘mouse’ 


A/  [los.’sun] ‘mortar’ 
[pos.'sun] ‘water jar’ 


/j/ — [‘ju:.pa] ‘centipede’ 
['ku:.pa] ‘thick’ 


Iwi — [wod.' dow] ‘afternoon’ 
[kod.'dow] ‘noon’ 


Word-Medial 
[ hi:.mat] ‘needle’ 
[hi.' kat] ‘fast’ 


[?1.'n9o] ‘mother’ 
['Pi:.wo] ‘saliva’ 


[Pan.'na?] ‘child’ 
[Pan.'ga?] ‘nickname 


? 


['pu:.hun] ‘heart’ 
[pu.' Pun] ‘to wrap’ 


[ ‘bu:.la] ‘moon’ 
[bu.'na] ‘to hit’ 


[ha.'jup] ‘false’ 
[ ha:.nup] ‘watermelon’ 


['bo:.wo?] ‘to pour’ 
[bo.'jo?] ‘face’ 
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Word-Final 
[lag.'gam] ‘bird’ 
[lag.' gas] ‘to wash hands’ 


[lud.'dun] ‘fish’ 
[lud.'dus] ‘to be lost’ 


[kol.'los] ‘strong’ 
[kol.'lot] ‘curly (of hair)’ 


[Po.'toj] ‘liver’ 
[Po .'top] ‘roof’ 


[Pot.'tow] ‘person’ 
[Pot.'tok] ‘brain’ 


Shown in Table 3 is the distribution of Bagobo-Klata consonants in all word positions. Notes are in 
order. First, all consonants (except the allophonic [r]) occur word-initially. Second, word-initial 
consonants can be singletons, geminates (segmental or derived), and a member of word-medial clusters. 
The examples shown below are singletons only. Third, only twelve consonants (except /h, n, I/ and [r]) 
can occur word-finally. As will be explained later, PMP *h, *n, and *1 are phonemically lost in the 
word-final position, although in some words, word-final /n/ is not deleted, as in (1). 


(1) pa.ka. ‘wan ‘cup’ 


/p/ 


/b/ 


sa. ‘ban ‘soap’ 
ma. ‘ja:.man ‘rich’ 


tu. 're?.kan ‘1 do not know’ 


Table 3: Bagobo-Klata consonants in word-initial, -medial, and -final Positions 


#_ 

[pud.'du] ‘gall, bile’ 
[pit.'tu] ‘seven’ 
[po.'los] ‘many’ 


[‘bu:.lu] ‘to cut, as one’s hair’ 
[bul.'las ‘to change clothes’ 
[bol.'1] ‘poison’ 


[kal.'lan] ‘scab’ 
[ko.'lat ‘thin (of a person)’ 


[kom.'mi] ‘beard’ 


['gu:.hin] ‘water jar’ 


V_V 

['?o:.puj] ‘fire’ 
['ta:.pi?] ‘wall’ 
['le:.pos] ‘sibling’ 

[' Pe:.ban] ‘left’ 
['go:.bo] ‘to deceive’ 
['?u:.bi?] ‘to request’ 
[' Pe:.kon] ‘tail’ 
['ku:.kan] ‘cockroach’ 
[‘lu:.ka] ‘bowl’ 


['la:.gat] ‘sea’ 
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_# 

['?i:.hip] ‘to whisper’ 
[?o.'kap] ‘bark (of a tree)’ 
[?a.'kap] ‘monkey’ 


['la:.qub] ‘cave’ 
['nu:.?ob] ‘fingernail’ 
[his.'sib] ‘fat, grease’ 


[lop.'puk] ‘consumed’ 
['tu:.kuk] ‘lazy’ 
[pal.'lok] ‘sand’ 


['la:.lig] ‘happy’ 


/g/ 


/t/ 


/d/ 


/?/ 


/m/ 


/n/ 


/y/ 


/s/ 


iil 


/w/ 


#_ 


[gi.'ca] ‘last’ 
[go.'ton] ‘eggplant’ 


[tu.’ Pud] ‘deer’ 
[tob.’ bin] ‘buttock’ 
['ti:.?i] “pinky finger’ 


[da. lun] “below, underneath’ 
[da.' gow] ‘short (of time)’ 
[dun. nuk] ‘flood’ 


[?e.'no?] murky 
[Pon.' nop] mist 
['?i:.hin] ring (for one’s finger) 


['ma:.me] ‘leg’ 
[mo.'jo] ‘young woman’ 
[mo.' Po] ‘betel, areca nut’ 


[na.'nam] ‘taste’ 
[ne.'pes] ‘thin (of an object)’ 
[no.' Pos] ‘sound, noise’ 


[n9. nuk] ‘to hunt’ 
[nil.‘lo] ‘ear’ 
[nit.'ton] ‘dark’ 


[sos.'sop] ‘to suck’ 
[sa.lup.'pan] ‘loincloth’ 
[saj.jow] ‘to dance’ 


['ho:.lo] egg 
[his.'sip] louse (of a chicken) 
[hug.' gu] to push 


[jab.ba?.'na:.pu] ‘nightmare’ 
[joj.'jo] ‘shame’ 
[pa. ja] ‘big’ 


[wi.'ti] ‘hungry’ 
[wod.' dow] ‘afternoon’ 
[' wo:.?o] ‘abaca fibers, hemp’ 


[‘li:.tu] to ‘burn’ 
[lab.' bus] ‘poor’ 
[lal.'lom] ‘deep’ 


3 Geminates 
A geminate refers to a sequence of two identical adjacent consonants in a single morpheme (Crystal 
2008:206); a single consonant is called a singleton. In terms of word position, Bagobo-Klata geminates 
are strictly word-medial. Moreover, they can be segmental (i.e., occurring in roots) or derived (i.e., 
triggered by affixes). 
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V_V 
['mo:.gat] ‘mote’ 
['?i.g0?] ‘betel leaf’ 


['Pe:.ta?] ‘thigh’ 
[ 'ti:.toj] ‘bridge’ 
[ pu:.tow] ‘iron’ 


[ ‘le:.dok] ‘waterfall’ 
['ta:.do?] ‘to drip’ 
['to:.do] ‘to follow (as a trail)’ 


['ti:.Pow] ‘clear (as water)’ 

‘Per. PE ‘yes’ 

['tu:.?uw] to string together’ 
['nu:.ma] ‘story’ 

[‘lu:.mut] ‘moss’ 

[‘lo:.ma?] ‘knife (of a datu)’ 


[‘la:.nu] ‘sad’ 
['no:.no] ‘prawn’ 
['po:.na?] ‘bait’ 


['lu:.na] ‘shade’ 
['me:.na] ‘stove’ 
['lu:.qu] ‘coffin’ 


['ma:.si] ‘salty’ 
[ba:.sa] ‘to read’ 
[ku.'li:.sap] ‘dandruff’ 


['bo:.how] ‘arrow’ 
['‘to:.hu] ‘cucumber’ 
['ba:.how] ‘thirsty’ 


['ma:.jow] ‘raincloud’ 

['lu:.jo] ‘ginger’ 

['?i:.jup] to “blow (using one’s 
mouth)’ 


['tu:.wo] ‘old person’ 
['Pa:.wa] ‘rainbow’ 
[ ba:.we?] ‘medicine’ 


['wa:.loj] ‘tired’ 


['da:.la ]‘trail’ 
[‘bo:.la?] ‘bubbles’ 
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_# 
[po.ko.'log] ‘how’ 
['ka:.bog] fruit ‘bat’ 


[‘ku:.lit] ‘skin’ 
[lo.'met] ‘weak’ 
['ko:.het] ‘cave bat’ 


[te.’ kod] ‘heel’ 
[tul.'lid] ‘straight (as a stick)’ 
[‘bu:.lud] ‘mountain’ 


[lom.'m9?] ‘morning’ 
['Pe:.ma?] ‘armpit’ 
[ba.‘ju?] ‘cheek’ 


['?e:.lam] ‘wasp’ 
['?o:. jam] ‘yawn’ 
[?ik.'kam] ‘mat, for sleeping’ 


['tu:.ban] ‘in front of? 
['Pe:.gon] ‘feces’ 
[mes.'sen] ‘industrious’ 


['ma:.las] ‘spicy’ 
[nuw.' was] ‘accustomed’ 
['?u:.pus] ‘cat’ 


[Pal.'loj] ‘chin’ 
[lus.'suj] ‘gums’ 
['nu:.?uj] ‘whistle’ 


[?u. pow] ‘bald’ 
[pe.' Pow] ‘lame, crippled’ 
['ga:.how] ant 


[pak.'sul] ‘hole in the ground’ 
[ban.'kil] ‘canine tooth’ 


Geminate 


pp 


bb 


tt 


dd 


kk 


88 


ss 


nn 


ny 


ll 


Www 


J 
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Table 4: Bagobo-Klata geminates 


Examples 

[dip.'po] ‘armspan’ 

[?ip.'pus] ‘gasp, pant’ 

[lop. puk] ‘all gone, consumed’ 


[bab.'ba] “broken in pieces’ 
[tab. ban] ‘bland’ 
[mob. but] ‘animal’ 


[klat.'tan] ‘steps in a ladder’ 
[Pot.'tow] ‘person’ 
[nit.'ton] ‘dark’ 


[tud.'du?] ‘nape’ 
[?Pid.'du] ‘pity’ 
[lod.' dog] ‘rotten’ 


[bik.'ko] ‘necklace’ 
[?ik.'kam] ‘mat, for sleeping’ 
[lak.'ka?] ‘jackfruit’ 


[tug.' go] ‘post, house pole’ 
[Pog.' gi?] ‘cogon grass’ 
[lag.’ gu] ‘while’ 


[dis.'so?] ‘sty, in one’s eye’ 
[mes.'sen] ‘industrious’ 
[lis.'so] ‘seed, of fruit’ 


[dum.'mo] ‘other’ 
[?am.'mu?] ‘breast’ 
[lam.'mi] ‘new’ 


[ton. nob] ‘honey’ 
[?on.'nop] ‘fog, mist’ 
[lan.'na] ‘pus’ 


[pon.'nu:.?o] ‘chieftain’ 
[dan.'na] ‘before’ 
[hon.'now] ‘steam, vapor’ 


[pul.'lu?] ‘ten’ 
[bal.'las] ‘rice (uncooked)’ 
[lal.'lom] ‘deep’ 


[buw.' wa] ‘hammock’ 
[Puw.' wo] ‘two’ 
[luw.' wu] ‘winnowing basket’ 


[baj.'jo] ‘crocodile’ 


[gaj.'jo] ‘south wind’ 
[joj.'jo] ‘shame’ 
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Shown in Table 4 are segmental geminates. Only thirteen consonants (except the glottals /?, h/ and the 
allophonic tap [r]) can become geminates, but a few affixes can trigger the glottals to become geminates. 
As shown in (2) and (3), /h/, at least word-initially, becomes [ss] with distributive numeral affix to- and 
the realis patient-voice (PV) bo-. 


(2) to- + ho. ‘tu > tas. 'so..tu 
DN one one each 

(3) ba- + hal. ‘la > bos.sal. ‘la 
RLS.PV to fry rice RLS.PFV-fry rice 


On the other hand, the glottal stop /?/, specifically in the word-initial position, becomes a geminate and 
has several allophones [gg, tt, Il, jj, ww], as shown in (4), (5), (6), (7), and (8). What is certain, thus far, 
is that the derived geminate forms of /?/ is due to fortition. 


(4) bo- + "Pe!.pok > boj. je:.pak 
RLS.POT to cut RLS.POT-cut 
(5) bo- + "Pe. pak => bog. 'ge:.pok 
RLS.PV to cut RLS.PV-cut 
(6) po- + Pan. ‘yar > poal.lan. ‘ya? 
CAUS.RLS.PV child CAUS.RLS.PV-give birth 
(7) to- + Puw. ‘wo > tal.lu. ‘wa 
DN two DN-two (two each) 
(8) to- + Pot. ‘tow > tat.tat. ‘tow 
NOM person NOM.-person (self) 


Like consonants and vowels, segmental geminates are also contrastive, as shown in Table 5. 


Table 5. Contrastive geminates in Bagobo-Klata 


Geminate Singleton 

[pot.'toj] ‘firefly’ [po.'toj] ‘to die, to kill someone’ 
[bol.'loj] ‘to give’ [bo.'loj] ‘house’ 

[tap.'pe] ‘old (object)’ [ta.’ pe] ‘twin’ 

[ton. go] ‘half? ['to:.y0] ‘nipple, teat’ 

[kap.' pen] ‘to split into halves’ _[ka.'pen] ‘wood stick’ 

[hud.'du?] ‘lad’ ['hu:.du?] ‘to carry on one’s head’ 
[lim.'mo] ‘hand’ [li.'mo] ‘five’ 

[?in.'no] ‘to look’ [?1.'no] ‘mother’ 

[Piw.'wo] ‘envy’ [' Pi:.wo] ‘saliva’ 
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4 Vowels 
Bagobo-Klata has five vowels that contrast in vowel quality, tongue position, and the degree of lip 
rounding. As shown in Table 12, only /u/ and /o/ are rounded—full and half, respectively. 
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Table 6: Bagobo-Klata vowels 


Close 
Mid 
Open 


Front Central 
i 


E 


Back 


u 
2) 


Shown in Table 13 are the minimal pairs of Bagobo-Klata vowels in all syllable positions. 


Table 7: Minimal pairs of Bagobo-Klata vowels 


Penultima 

/i/ — [kid.'dut] ‘astonishment’ 
[kod.' dut] ‘to pinch’ 

/u/ — [kul.'li] ‘a long time ago’ 
[kil.‘li] ‘eel’ 

lel ['?e:.ta?] ‘calf (of one’s leg)’ 
['Po:.ta?] ‘to vomit’ 

/3/_— [bol.'li] ‘to buy’ 
[bul.'li] ‘evening/night’ 

/al_ [?ap.'pat] ‘four’ 


[Pup. pat] ‘to jump’ 


Ultima 
[tob. bi] ‘to sew’ 
[tob.' bu] ‘sugarcane’ 


['tu:.wu] ‘to grow’ 
['tu:.wo] ‘old’ 


['lo:.le] ‘valley’ 
['lo:.la] ‘nest’ 


[Pul.'lod] ‘larva’ 
[?ul.'lud] ‘floor’ 


[Pan.'nat] ‘to wait’ 
[Pan.'net] ‘to bite’ 


Both Pen/Ultima 
['ti:.?i] ‘pinky finger’ 
['te:. Pe] ‘hard’ 


[klut.'tun] ‘forehead’ 
[klat.'tan] ‘ladder steps’ 


[ he:.le] ‘to lean’ 
['be:.le] ‘to stay’ 


[bob'bo] ‘window’ 
[bub 'bu] ‘feather’ 


[hal.'‘la] ‘to fry rice’ 
[hul.'lu] ‘to command’ 


Moreover, all five Bagobo-Klata vowels can occur in all syllable positions, as exemplified in Table 8. 


hf 


ful 


/e/ 


/9/ 


/al 


Table 8: Bagobo-Klata Vowels in Antepenultima, Penultima, and Ultima 


Antepenultima 
[ma.li.gon.'noj] ‘beautiful’ 
[pi.jas.'su] ‘spear’ 


[pu.' wa:.las] ‘forest, woods’ 


[te.ne.'pog] ‘mens’ pants’ 


[ho.mo.bo.'jo?] ‘ugly’ 
[bol.'li:.joj] ‘drunk’ 


[Pa.'ri: jus] ‘earrings’ 
[tam.'ba:.ga] ‘copper’ 


5 Syllable Structure 
In Bagobo-Klata, the basic syllable structure is (C1)C2V(C3): the obligatory syllable elements are the 
onset and the nucleus, while the coda and the extra onset, which forms a cluster with the obligatory 
onset, are optional. Shown in Table 9 are the four permissible syllable structures in Bagobo-Klata, along 
with examples. 


Penultima 
['ti:.nut] ‘greedy, selfish’ 
['‘li.tu:] ‘mole, birthmark’ 


[ ‘ku: lit] ‘skin’ 
[lul.'lut] ‘a Bagobo-Klata dish’ 


[te. kod] ‘heel’ 
[let.'ta?] ‘sap, resin’ 


[ko. lat] ‘thin (of a person)’ 
[Pol.'lon] ‘neck’ 


['Pa:.wak] ‘waist’ 
[dan.'now] ‘handspan’ 
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Ultima 
[lit.'ti] ‘thunder’ 
[‘tu:.li?] ‘earwax’ 


['da:.mu] ‘dew’ 
['gi:.bud] ‘fontanelle’ 


[la.' we] ‘tall (of a person)’ 
[Pog.' get] ‘clothing’ 


[kop.'po] ‘chest’ 
[kom.'mon] ‘fist’ 


[bon.'na] ‘true’ 
[ka.'jang] ‘bright, as light’ 
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Table 9: Syllable patterns in Bagobo-Klata 


Syllable Structure 


CV 


CVC 


CCV 


CCVC 


Examples 

[bo.'jo?] ‘face’ 
[hi.'ju] ‘elbow’ 
['pa:.?9] ‘foot’ 
[mo.'to] ‘eye’ 
['gi:.bud] ‘hair whorl’ 


[klut.'tun] ‘forehead’ 

[nil.‘lo] ‘ear’ 

[?ob.'buk] ‘hair, of one’s head’ 
[?il.'lo?] ‘mole, birthmark’ 
[bok.'kon] ‘shin, of one’s leg’ 


[' gka:.wan] ‘hip’ 
['mle:.de] ‘yellow’ 
['klu:.go?] ‘wart’ 

[' gna:.lo] ‘throat’ 
['kle:.wen] ‘eyebrow’ 


[klam.'mag] ‘star’ 
['mle:.?an] ‘rough’ 
[blun.'qus] ‘mouth’ 
[mlun.'now] ‘green’ 
[kjuw.' wa] ‘bee’ 


Moreover, Bagobo-Klata has consonant clusters or a sequence of two consonants. In terms of word 
positions, they can be complex onsets (i.e., two onsets occurring word-initially) or word-medial clusters 
(i.e., a coda plus an onset occurring word-medially). On the one hand, complex onsets can be segmental 
or derived. Shown in (9), (10), (11), (12), and (13) are the permissible combinations of C1 and C2 in 


Bagobo-Klata. 


(9) astop 


pl 
bl 
kl 


gl 


(10) astop 


(11) astop 


(12) anasal 
ml 


(13) astop 
gn 


+ a lateral 
'ple:.tek ‘wing? 
‘bli:.bud ‘whirlpool’ 
‘kli:.tut ‘anus’ 


‘glo:.puj ‘bee’ 
+ a stop 

'gka..way ‘hip’ 
+ an approximant 


bja. ‘Pow ‘drizzle’ 

‘kja:.hay ‘cockroach’ 
‘gja.:.wat ‘pimple’ 

kwa. ‘lo ‘worm, earthworm’ 
gwa. ‘li ‘across, opposite side’ 


+ lateral 
‘mlo:. Pos ‘smooth’ 


+ a nasal 
‘gna..lo ‘throat’ 
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On the other hand, Bagobo-Klata has word-medial clusters, which can be homorganic (i.e., a word- 
medial consonant sequence of a nasal consonant and another consonant with an identical feature) or 
heterorganic (i.e., a sequence of two consonants with non-identical features). However, it must be noted 
that these clusters are rare because as will be seen in 8.6, historical clusters of PMP forms became 
geminates in Bagobo-Klata. 


Table 10: Word-medial clusters 


Homorganic Heterorganic 

[pan.'tug] ‘bladder’ [kas.' pa] ‘dandruff 
[ban.'tu:.gan] ‘famous’ [pak.'sul] ‘hole (in the ground)’ 
[bay.'kil] ‘canine tooth, fang’ [lug.'wa?] ‘outside’ 


Like phonemes and geminates, segmental complex onsets in Bagobo-Klata are also contrastive, as 
shown in (14). 


(14) — klom.'ma? ‘tomorrow’ 
lam. ‘ma? ‘morning’ 


Finally, the number of syllables in Bagobo-Klata words depends on whether the words in question are 
unaffixed or derived. In roots, monosyllabic or disyllabic words are more common than trisyllabic ones, 
while in derived words, disyllabic words are as common as trisyllabic ones. 


6 Segmental Stress 
In this study, segmental stress is defined as the prominence (i.e., loudness) of a specific syllable in a 


given word (Crystal 2008). First, in Bagobo-Klata, segmental stress is contrastive, as shown in Table 
11. 


Table 11: Contrastive stress in Bagobo-Klata 


Penultima Ultima 

['?u:.la] ‘snake’ [?u. la] ‘rain’ 
['pa:.toj] to ‘quarrel’ [po.'toj] ‘to kill’ 
[ 'ba:.sa] to ‘read’ [ba.'sa] ‘squash’ 
['?9:.lat] ‘scar’ [?o. lat] ‘vein’ 
['ku:.lun] to ‘snore’ [ku. Tun] ‘back’ 
['a:.tin] ‘sweat’ [Pa. tin] ‘if? 
[‘la:.wu] ‘cloud’ [la."wu] to ‘fall’ 


Second, stress is also predictable because of vowel length. Table 12 shows that stress falls on the ultima 
(i.e., light (CV) or heavy (CVC)) if there is no vowel length. 
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Table 12: Stress in ultima 


CV 

[to.' Pi] ‘to accompany’ 
[?o.'ka] ‘what-you-call-it’ 
[da.'ja] ‘west’ 

[mo.'la] ‘light, in weight’ 
[lo."wa] ‘wide’ 

[la.'we] ‘tall, of a person’ 
[pa. ja] ‘big’ 

[ba.'sa] ‘pumpkin, squash’ 
[mo.' 9] ‘betel, areca nut’ 
[ka.'Pa] ‘to eat’ 


CVC 

[hi.'jaw] ‘nine’ 
[?u. lob] ‘knee’ 
[lo.'tus] ‘hundred’ 
[go.' tan] ‘eggplant’ 
[do. lid] ‘root’ 
[ta."lum] ‘papaya’ 
[?u. waj] ‘rattan’ 
[pa. et] ‘bitter’ 
[29.'lak] ‘to kiss’ 
[lo.' uj] ‘to swim’ 


On the other hand, Table 13 shows that if a word has vowel length, stress falls on the penultima. 


Table 13: Stress in penultima 


CV 

['du:.lu?] ‘blood’ 
['pu:.hun] ‘heart’ 
['ba:.gok] ‘disease’ 
['ka:.gow] ‘microbe’ 
['ma:.li?] ‘good’ 

[ hu:.ko] ‘anger’ 

['ka:.po] ‘cold, of weather’ 


['ba:.lu] ‘provision, as for a journey’ 
['ma:.?a] ‘sheath, as of a bolo/knife’ 


[ la:.?i] ‘man, male’ 


CCV 

['gla:.?u] ‘throat’ 

[ kli:.tut] ‘anus’ 

[' gja:.wat] ‘pimple’ 
['mle:.de] ‘yellow’ 
[' gnu:.wo] ‘earth’ 

[ bja:.?9] ‘year’ 

[ ble:.?¢] ‘gecko’ 

[ kli:.gi?] ‘hawk’ 
['blo:.wa?] ‘spider’ 
['gka:.wan] ‘hip’ 


7 Morphophonological Processes 

This section discusses several morphophonological processes in Bagobo-Klata. In this study, they are 
called as such because they (except reduplication and cliticization) are mostly triggered by affixation. 
As will be seen in the following subsections, one affix, usually verbal, can trigger up to seven processes. 


7.1 Syllable Deletion 

In Bagobo-Klata, syllable deletion is a morphophonological process that deletes the least prominent 
syllable (or the antepenultima) in an affixed word. There are three identified affixes in Bagobo-Klata 
that trigger syllable deletion, namely, the irrealis patient-voice (PV) -a, the irrealis benefactive/locative- 
voice (B/LV) -a, and the nominalizing -a. 

There are only a few verbs to which the PV and LV verbal affixes -a and -a can be attached. Once 
attached to verbs, these affixes trigger syllable deletion. In (15) to (17), the verb roots are all disyllabic 
and, after affixation, become trisyllabic, but -a deletes the antepenultimas (i.e., the ones in bold), while 
the stress shifts to the penultima. 


(15) ‘Pe.jap + -o > "PE! Ja.po > ja..po 

to count count-IRR.PV count-IRR.PV 
(16) ha. ‘wet + -9 = ha. ‘we..to — ‘we..to 

to hook hook-IRR.PV hook-IRR.PV 
(17) pep. ‘per + -9 > pep. pe:. 9 > pes. P29 

to wash wash-IRR.PV wash-IRR.PV 
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Shown below are Bagobo-Klata verbs affixed with -a, which deletes all the antepenultimas. 


(18) ‘ta:.way + -a > ta:.wa.ya => ‘kwa..ya 

to help help-IRR.BV help-IRR.BV 
(19) ‘ka:.peP +-a > ‘ka: pe.?a > pel. ra 

to hold hold-IRR.BV hold-IRR.BV 
(20) ‘ha..kaj + -a > ‘ha:.ko.ja > ‘ka. ja 

to ride ride-IRR.LV ride-IRR.LV 


Moreover, it will be seen in the subsequent processes (e.g., assimilation, gemination, epenthesis, 
fortition, and lenition) that syllable deletion interacts with them and is required to take place before or 
after them. 


7.2 Vowel Harmony 
Vowel harmony is a morphophonological process in which two adjacent vowels become similar 
(Crowley & Bowern 2010). The affixes that trigger this process are the verbal affixes -2 and -a. Derived 
words such as kana and yaja will be used to explain this process. 

In deriving the patient-voice affixed verb kana, there are four morphophonological processes at 
play, which will be explained chronologically. First, after -a is affixed to ka. '?a to eat, the historically 
deleted word-final *n is reinstated. 


(21) Epenthesis 
ka. ‘Pa + -9 — ka. 'Pa:.no 
toeat IRR.PV eat-IRR.PV 


After (21), the onset /?/ in the penultima becomes [k], because of adjacent assimilation to the onset /k/ 
of the antepenultima. 


(22) Adjacent Assimilation 
ka. 'Pa:.no > ka. ka:.no 
eat-IRR.PV eat-IRR.PV 


Third, the antepenultima [ka] is deleted, because of -o. 


(23) Syllable Deletion 
ka. ‘ka:.na > ‘ka:.no 
eat-IRR.PV eat-IRR.PV 


Fourth, in the disyllabic form, /a/ in the penultima becomes [9] because -a triggers vowel harmony. 


(24) Vowel Harmony 
‘ka:.no —> ‘ko:.no 
eat-IRR.PV eat-IRR.PV 


The chronology discussed seems surprisingly incredible because it would be more convenient for 
Bagobo-Klata speakers to delete the penultima of [ka.'?a:.no], instead of the antepenultima. However, 
it can be seen in previous examples in 7.1 that the syllables -a or -a prefer to delete in affixed verbs is 
the antepenultima. Thus, in (23), before the antepenultima [ka] is deleted, the onset of the penultima 
assimilates to that of the antepenultima. Now, regarding the vowel harmony of [a] in the penultima 
['ka:] to [9], it can be a case of reinstatement of a Bagobo-Klata vowel that historically shifted. As will 
be explained in 8.5., /o/ and /a/ are reflexes of PMP *e and *a, respectively. Both proto-phonemes 
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sporadically assimilated to each other in some proto-forms. In the case of /ka.'?a/ as the reflex of PMP 
*kaen, *e is argued to have assimilated to *a, hence becoming /a/, but its historically unassimilated 
form *o is reinstated through vowel harmony. 


(25) Affixation 
ray. nag +#-a ray. 49.ja 
totake IRR.AV 


(26) Syllable Deletion 
Pay. 49.ja > ya. ja 


(27) Vowel Harmony 
4a..ja > ya:ja 
take-IRR.PV 


In the case of Pay. ‘naj “to take’, there are only two processes involved: syllable deletion and vowel 
harmony. Perhaps, what can be deduced here is that vowel harmony requires syllable deletion to take 
place first, at least, in Bagobo-Klata verbs affixed with -a and -a. 


7.3 Gemination 
Gemination is also a morphophonological process in which a singleton becomes a geminate (Campbell, 
2013). As previously discussed, geminates are not only segmental but also derived through affixes such 
as the irrealis PV -9, the realis PV and B/LV bo-C~(...-a), the realis instrumental voice (IV) pa-C~, and 
the realis abilitative/potentive ka-C~. 

First, -o does not apply to all verbs. It appears to require a verb that has a heavy ultima with /o/ 
as its nucleus, although it can also be /e/ or /a/. Like vowel harmony, syllable deletion precedes 
gemination, so there are three processes involved after the -a affixation in (28). 


(28) Affixation 


Pop. pas + -9 > Pop. PI.8I 

to borrow-IRR.PV borrow-IRR.PV 

pom. mot > pom. '‘mo.ta 

to catch catch-IRR.PV 

kok. ‘kot > kok. ‘ko.to 

to dig dig-IRR.PV 

‘na.ta > not. ‘ta 

to ask a question ask a question-IRR.PV 


At least, in (28), it can be noticed that after affixation, syllable deletion and degemination takes place 
simultaneously. 


(29) Syllable Deletion 


Pop. PI.8I => 'PI.S9 

borrow-IRR.PV borrow-IRR.PV 

pom. '‘m9..to > ‘mo.to 

catch-IRR.PV catch-IRR.PV 

kok. ‘ko..to > ‘ko. ta 

dig-IRR.PV dig-IRR.PV 

‘na..to > ‘na.ta 

ask a question-IRR.PV ask a question-IRR.PV 
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(30) Gemination 


'p9.S9 => Pos. ‘so 
borrow-IRR.PV borrow-IRR.PV 

‘mo. to > mot. ‘to 

catch-IRR.PV catch-IRR.PV 

‘ko. ta > kot. ‘to 

dig-IRR.PV dig-IRR.PV 

‘na.ta > not. ‘ta 

ask a question-IRR.PV ask a question-IRR.PV 


Second, in (31-34), it is shown that Bagobo-Klata allows two geminates in one word through bo-C~, 
bo-C~...-a, po-C~, and ko-C~. 


(31) bo- +. kok. ‘kot > bob.bal li 
RLS.PV to dig RLS.PV-dig 
hap. ‘pj => bos.sap. ‘p2j 
to hang RLS.PV-hang 
tat. ‘tad => bot.tat. ‘tad 
to mince RLS.PV-mince 


(32) bo-...-a + bul. ‘las > bol.las. ‘sa 


RLS.BV to change RLS-change-BV 
ka'?a > bok.kan. ‘na 
to eat RLS-eat-LV 
(33) po- + tak. ‘kub > potakkub 
RLS.IV to cover RLS.IV-cover 
lag. ‘gas > pol.lag. ‘gas 
to wash RLS.IV-wash 
(34) ko- + loy. ‘na? > kol.loy. ‘ya? 
RLS.ABI to flee RLS.ABIL-flee 
lut. ‘tu > kol.lut. ‘tu 
to cook RLS.ABIL-cook 
7.4 Epenthesis 


Epenthesis is defined as the insertion of a speech sound, typically a consonant, in a given word due to 
affixation. This happens if a suffix is attached to a word with a light ultima, and because vowel 
sequences are impermissible in Bagobo-Klata, there must be an obligatory onset to intervene that takes 
the form of an epenthetic consonant. 

Epenthetic forms [n] and [I] as in (35) have traceable origins. As previously stated, word-final *n 
and *l are historically deleted sounds in Bagobo-Klata, but they are reinstated through affixation, 
specifically epenthesis. In the examples below, word-final [n] and [I] serve as the respective onset of 
each ultima. 


(35) Epenthesis 


Pu. la + -a > Pu. la.na 

rain IRR.LV rain-IRR.LV 
bu. ‘na + -2 > bu. ‘na.lo? 

to hit IRR.PV to hit-IRR.PV 


As seen in (35), epenthesis precedes syllable deletion, and the same can be said for affixed verbs such 
as kolana ‘to rain’ and nalo ‘to hit’. 
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(36) Syllable Deletion 


Pu. 'la.na > la:.na 
rain-IRR.LV rain-IRR.LV 
bu. ‘na.lo? > ‘na. lo? 
hit-IRR.PV hit-IRR.PV 


(37) Affixation 
ko- + Ta:.na —> ko. 'la:.na 
POT rain-IRR.LV POT.IRR-rain-LV 


However, there are epenthetic forms with obscure origins such as in (38). 


(38) Epenthesis 


‘he: le + 9) > ‘he: le.lo 
to lean IRR.PV 
ta. ‘Pi + -a > ta. ‘Pina 
to accompany IRR.BV 


In /to.'?i/, /?/ assimilates to /t/, becoming [t] after epenthesis. 


(39) Syllable Deletion 


‘he..le.lo = le:.lo 

lean-IRR.PV lean-IRR.PV 

to. ‘tina => ‘tina 
accompany-IRR.BV accompany-IRR.BV 


Then, in more examples below, it is seen that syllable deletion precedes epenthesis and that these 
epenthetic forms [k, g], which act as the C1 in complex onsets, have even more obscure origins. 


(40) Affixation 


‘ta:.way + -a = ‘ta:.wa.ya 
to help IRR.BV help-IRR.BV 
‘be..neP + -a > be..ne.?a 

to cry IRR.BV help-IRR.BV 
do:.loyn + -2 > ‘da..lo.ya 

to honor IRR.PV honor-IRR.PV 


(41) Syllable Deletion 


‘ta:.wa.ya > wa..4ya 
help-IRR.BV help-IRR.BV 
be..ne.?a > né..Pa 
help-IRR.BV help-IRR.BV 
‘do:.lo.ya => lo..49 
honor-IRR.PV honor-IRR.PV 


(42) Epenthesis 


wa..4ya > ‘kwa..ya 
help-IRR.BV 
neé..Pa > 'gne:. Pd 
cry-IRR.BV 
loy. ‘no > gloy. ‘yo 


honor-IRR.PV 
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7.5 Fortition 

Fortition refers to the morphophonological process in which a speech sound becomes more sonorous or 
acoustically louder. In general, sonority or acoustic loudness of speech sounds can be gauged through 
a hierarchy. As shown in Figure 1, the sounds to the left are more sonorous, while the ones to the right 
are less sonorous (Hayes 2009:75). 


Figure 1: The Sonority Hierarchy 


greater sonority lesser sonority 
a 
vowels glides liquids nasals obstruents 


As mentioned in 2.2., the word-initial /?/ becomes [gg, tt, Il, jj, ww] because of certain affixes that 
trigger gemination. To explain this, it will be argued in this study that in Bagobo-Klata, gemination 
requires more sonorous consonants, so /?/ must undergo fortition to be eligible for gemination. To 
illustrate this argument clearly, derived forms such as jada ‘chair’ and woga ‘bathroom’ will be used. 
Note that these forms are verbal and nominal respectively. 


(43) Affixation 


"Pe. Pad + -d > "Pé:.P9.da 
to sit IRR.LV sit-IRR.LV 
ra. Pog -a > ra. 'P9.ga 
to bathe NOM bathe-NOM 


(44) Syllable Deletion 
'Pe..P9.da > 'P2:.da 


sit-IRR.LV sit-IRR.LV 
ra. 'P9.ga > P0..2a 
bathe-NOM bathe-NOM 


(45) Fortition 


'Po:.da > ja..da 
sit-IRR.LV sit-IRR.LV 
70..2a > '‘wa..ga 
bathe-NOM bathe-NOM 


Since b2- is established to trigger gemination, it would make sense for the word-initial glottal in <oda> 
to undergo fortition, hence <yoda>. Thus far, what is certain is that [g] expresses patient voice, while 
[j], potentive. (5) and (6) in 4. are repeated here as (46) and (47). 


(46) bo- + "Pe. pak > bog. 'ge:.pok 
RLS.PV to cut RLS.PV-cut 
(47) bo- + "Pe. pak > boj. je:.pak 
RLS.POT to cut RLS.POT-cut 
7.6 Lenition 


Lenition is the opposite process of fortition, that is, more sonorous to less sonorous. In Bagobo-Klata, 
-a triggers word-final /s/ in the examples below to lenite. 
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(48) Affixation 


lag. 'gas+ 
to wash 
kik. ‘kis 
to scrape 
lag. ‘gas 
to wash 


(49) Syllable Deletion 


lag. 'ga.sa 
wash-NOM 
kik. ‘ki.sa 
scrape-NOM 
lag. 'ga.so 
wash-IRR.PV 


(50) Lenition 


'ga..Sa 
‘kis.sa 


‘Ga. 


IRR.PV 
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> lag. 'ga.sa 
wash-NOM 

> kik. ‘ki.sa 
scrape-NOM 

> lag. 'ga.so 
wash-IRR.PV 


'ga..Sa 
wash-NOM 
‘kis.sa 
scrape-NOM 
'Gd.sO 
wash-IRR.PV 


‘ga:.ha 


As seen in (48), there are two processes involved: after affixation, the antepenultimas in (49) are deleted. 
Then, in (50), the singleton /s/ lenites to /h/. 


wash-NOM (sink) 


‘ki:.ha 


scrape-NOM (coconut scraper) 


'ga:.ho 
wash-IRR.PV 


Perhaps, the lenition of /s/ in the examples above is a case of reinstatement of the historically deleted 
word-final *h. Through lenition, word-final /s/ synchronically reverts to [h]. 


7.7 Syncope 


Syncope is a morphophonological process in which a medial segment, typically a vowel, is deleted 
(Crowley & Bowern, 2010). In Bagobo-Klata, the unstressed nucleus in penultima of disyllabic words 
such as /bol.'1oj/ and /bol.'li/ is deleted when these words are affixed with -o. 


(51) Affixation 


bal. ‘laj 
to give 


bal. ‘li 
to buy 


bal. li 
to buy 


(52) Degemination 


bol. ‘loja 
give-IRR.BV 
bol. 'lija 
buy-IRR.BV 
bol. 'li.ja 
buy-IRR.PV 


bo. ‘la.ja 
give-IRR.BV 
bo. ‘li.ja 
buy-IRR.BV 
bo. ‘li.ja 
buy-IRR.PV 


bol. 'loja 
give-IRR.BV 


bol. 'li.ja 
buy-IRR.BV 


bol. 'li.ja 
buy-IRR.PV 
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As can be noticed in (53), syncopated nuclei in the antepenultima derive complex onsets. 


(53) Vowel Syncope 
bo. ‘laja 
give-IRR.BV 
bo. ‘li.ja 
buy-IRR.BV 
bol. 'li.ja 
buy-IRR.PV 


7.8 Degemination 


> blo. ja 
give-IRR.BV 

=> bli. ja 
buy-IRR.BV 

> bli. jo 
buy-IRR.PV 


Degemination is a process in which a geminate becomes a singleton. In the data set, the distributive 
numeral affix /to-/ and the irrealis benefactive/locative -a are known to reduce the geminate in a root to 


a singleton. 


(54) to- + 
DN 
7.9 Nasal Substitution 


Puw. ‘wo 
two 


tol. ‘lu 
three 


rap. ‘pat 
four 


> tal. ‘lu:.wa 
two each 

> tot. ‘to:.lu 
three each 


=> tal. ‘la:.pat 
four each 


In Austronesian languages, nasal substitution is a very common morphophonological process (Blust, 
2004). In Bagobo-Klata, this process triggers the following affixes to replace word-initial sounds 
(except /n, n, J, w/) of roots: irrealis actor-voice (AV) m- and its realis counterpart (bon)n-, irrealis PV 
mem-/mom-, ir/realis B/LV tam/m-, and instrument nominalizer tam-. 


(55) m-+ 
IRR.AV 


bal. ‘laj > 
to give 

pan. ‘nek > 
to climb 

dod. ‘doy => 
to approach 
‘ha:.kaj => 
to ride 

tid. ‘du? — 
to stand 


mal. ‘loj 
IRR.AV-give 


man. ‘nek 
IRR.AV-climb 


mod. ‘doy 
IRR.AV-approach 


ma..kaj 
IRR.AV-ride 


mid. ‘du? 
IRR.AV-stand 
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(56) (bon)n- + m-(P)up. pat — nup. ‘pat 


RLS IRR.AV-jump RLS.AV-jump 
[m-()a."Paw => na. 'PIw 
IRR.AV-walk RLS.AV-walk 
m-(s)os.'sap => NISSIP 
IRR.AV-sip RLS.AV-sip 
m-(k)om.'mos => nommos 
IRR.AV-squeeze RLS.AV-squeeze 


m-(g)ayn. yet — nan. ‘net 
IRR.AV-bite RLS.AV-bite 


In (55) and (56), it is shown that eleven (11) Bagobo-Klata consonants can be replaced through nasal 
substitution and that homorgany is not required. 


7.10 Reduplication 

Reduplication in Bagobo-Klata occurs only when adjectives undergo intensification. Specifically, the 
penultima or antepenultima reduplicates, and regardless of the syllable structure, the reduplicated 
syllable is always open. 


(57) l.'wa > lo~lo. ‘wa 
wide very~wide 
(58) pa. ja > pa~pa. ja 
big very~big 
(59) to. 'woP > to~ta. ‘wa? 
fat very~fat 
(60) ma.li.gon.'‘naj 9 ma~ma.li.gon. ‘naj 
beautiful very~beautiful 


7.11 Cliticization 

Function words, especially those behaving like clitics, tend to attach or to cliticize to a word called a 
host. In Bagobo-Klata, words that often cliticize are non-personal nominal markers such as /ken/ and 
/nen/ and, of course, the relativizer /no/. The plural marker /be/ is also known to cliticize but only to 
/Panna?/. 


(61) ho.'maP key=bo. ja?=ni. ja => ho. mak bo. jar ni. ja 
bad ABS=face=3SG.GEN 
“His/her face is ugly.” 


In (61), when /ken/ cliticizes to the word it follows, it becomes [k]. 
(62) m-(k)a. ‘Pa=Pu ney=pi. ‘hit > ma. ‘Pa Puy pi. ‘hit 


INTR.IRR-eat=1SG.ABS GEN=shrimp 
“T will eat shrimp.” 
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(63) ‘ma:.liP=ya=lom. ‘ma? > ‘ma:.lin lom.'maP 
good=LKR=morning 
“Good morning” 


In (62) and (63), when cliticized, both /nen/ and /no/ become [ny]. 


(64) be=Pay. ‘ya? > bjay. ‘nar 
PL=child 
“Children” 


In (64), when the plural marker /be/ cliticizes to the word it precedes, what is retained of it is only the 
word-initial sound /b/. Then, the word-initial glottal stop in /Pan.'na?/ becomes a palatal sound, while 
/¢/ is deleted. However, it must be noted that /be/ cliticizes to / Pan.'na?/ only. 


8 Historical Phonology 

This section discusses how the phonological system of Bagobo-Klata developed from PMP. Only the 
salient reflexes of PMP phonemes in Bagobo-Klata are explained. The proto-forms are based on Blust 
and Trussel (Ongoing). 


8.1 The Reflex of PMP *n 

PMP *n was retained as /n/ in Bagobo-Klata word-initially and -medially but was lost word-finally, as 
in (65). 

(65) PMP *bulan > bula ‘moon, month’ 
PMP *bulaw-an > blawa ‘gold’ 
PMP *dahun > da?Pu ‘leaf? 

PMP *duRi-an > duliya ‘durian fruit’ 
PMP *haRezan > ?adda ‘ladder’ 
PMP *ipen > Pippo ‘tooth’ 

PMP *qutin > Puti ‘penis’ 

PMP *quzan > Pula ‘rain’ 

PMP *tian > tiya ‘belly’ 

PSP *libun > libu ‘woman’ 


8.2 The Reflex of PMP *R and *1 
PMP *R and *1 merged as /I/ in word-initial and word-medial positions. 


(66) PAN *kaRi > kali ‘language, word, to say’ 
PMP *beRay > bolloj ‘to give’ 

PMP *kulit > kulit ‘skin’ 

PMP *laki > /a?i ‘man, male’ 

PMP *lesuy > lassuy ‘rice mortar’ 

PMP *Ratus > /otus ‘hundred’ 

PMP *Rabun > lawu ‘drizzling rain, mist; fog’ 
PMP *tuli > tuli? ‘earwax’ 

PMP *uRat > Polat ‘vein’ 

PMP *zuRugq > dulu? ‘blood’ 

PWMP *Runaw > /unow ‘landslide’ 
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However, in the word-final position, both *R and *1 were phonemically lost: 


(67) PAN *qebel > ?9bba ‘smoke’ 
PMP *bitil > witi ‘hungry’ 
PMP *hulaR > Pula ‘snake’ 
PMP *gatel > gatto ‘itchy’ 
PMP *suqaR > Aa?a ‘thorn’ 


However, when *R is metathesized, it is preserved as, in (68). 
(68) PMP *keseR > dallas “strength, force, vigor’ 


8.3 The Reflexes of PMP *b 
PMP * b has two reflexes in Bagobo-Klata: /b/ and /w/. Shown in (69) are examples of the retention of 
PMP *b. 


(69) *b>b 
PMP *batuk > batuk ‘cough’ 
PMP *baRah > balls ‘embers, glowing coals’ 
PMP *tekub > takkub ‘to cover’ 


Like in the Téduray and Danaw languages, in Bagobo-Klata, PMP *-b- became /w/ intervocalically. 


(70) *b>w 

MP *besuR > woassu ‘satiated’ 
MP *belay > walaj ‘tired, weary’ 
MP *qabaRa > walls ‘shoulder’ 
MP *qubi > Puwwi ‘yam’ 

MP *tubuq > tuwu?r ‘to grow’ 
MP *taban > taway ‘help’ 

MP *tabeq > tawa? ‘fat; grease’ 
MP *ibeR > Piwwo ‘envy’ 

PPh *abaka > wars ‘Manila hemp’ 


8.4 The Reflexes of PMP *s 
PMP *s split into /h/ and /s/ only in word-initial and word-medial positions. 


(71) PAN *sikux > hiju ‘elbow’ 
PMP *pusun > puhuy ‘heart’ 
PMP *Rusuk > luhuk ‘ribcage’ 
PMP *sagebit > hawet ‘to hook on to something’ 
PMP *sakay > hakoy ‘to ride on something’ 
PMP *suat > huwat ‘to comb one's hair’ 


In word-final position, it was retained as /s/, which sporadically lenited to [h] through nominal or verbal 
affixation, as in seen 7.6. 


(72) PMP *hapejes > poddos ‘painful’ 


PMP *kemes > kommas ‘to squeeze’ 
PMP *tebus > tobbus ‘to redeem’ 
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8.5 The Reflexes of PMP *e and *a 
PMP *a and *e merged and became /a/ and /9/ in Bagobo-Klata. This can be seen in both penultima and 
ultima. 


(73) PMP *batu > batu ‘stone, testicle’ 
PMP *ina > Pino ‘mother’ 
PMP *manuk > monuk ‘chicken’ 
PMP *mata > moto ‘eye’ 
PMP *gatay > Potoy ‘liver’ 
PSP *iga? > Piga? ‘betel leaf’ 
PSP *sama? > hama? ‘bad’ 


(74) PMP *beli > bolli ‘to buy something’ 
PMP *enem > Ponnom ‘six’ 
PMP *leben > /obboy ‘to bury’ 
PMP *sejem > sa/om ‘sugar ants’ 
PMP *telu > tollu ‘three’ 
PMP *tebuh > fobbu ‘sugarcane’ 


There are instances in which PMP *e sporadically assimilated to PMP *a, as in (75). 


(75) *e>*o>a 
PAN *kaen > ka?a ‘to eat’ 
PMP *beRas > ballas ‘uncooked rice’ 
PMP *epat > Pappat ‘four’ 
PSP *ena? > Payyar ‘child’ 


Then, PMP “a sporadically assimilated PMP *e, as in (76). 


(76) *a>9 
PMP *hajek > ?olok ‘to kiss’ 
PMP *qgatep > Potap ‘roof’ 
PMP *tazem > ta/om ‘sharp and pointed’ 


Finally, shown in (77) are retentions of PMP *a. 


(77) PMP *qatin > Patiy ‘sweat, perspiration’ 
PMP *palaj > palad ‘palm (of a hand, foot)’ 
PWMP *bala(n)tik > blatik ‘a spring-set spear trap’ 
PWMP *balun > balu “provisions for a journey’ 
PPh *bakbak > pabbak ‘frog’ 
PSP *anat > Paynat ‘to wait’ 


8.6 Historical Gemination 

In Austronesian languages, consonant gemination has three well-established sources (Blust 1995:125- 
127): elimination of an articulatory gesture, syncope of an unstressed vowel, and consonant lengthening 
after schwa. All these three sources can account for the historical development of gemination in 
Bagobo-Klata. 

First, shown in (78) are Bagobo-Klata geminates that developed through the elimination of an 
articulatory gesture (i.e., manner, place, or both) or the assimilation of homorganic or heterorganic 
consonants. In (78), the homorganic consonants *mp, *nd, and *nd respectively became /pp/ and /dd/ 
through regressive assimilation. 
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(78) PMP *ampu > Pappu? ‘grandparent’ 
PMP *empu > Pappu ‘grandchild’ 
PMP *dandan > doddoy ‘near’ 
PMP *sampay > happoaj ‘to hang clothes’ 
PMP *tindes > tiddoas ‘to crush lice with the fingernails’ 


Then, in (79), the heterorganic consonants *pk, *sk, and *dk all became /kk/, while *ps and **kt, /ss/ 
and /tt/ respectively. 


(80) PAN *sepsep > sassap ‘to sip’ 
PAN *tektek > tottak ‘house lizard’ 
PMP *kaskas > kakkas ‘to scratch’ 
PMP *kepkep > kakkap ‘to embrace’ 
PMP *panahik > pannek ‘to climb’ 
PPh *kadkad > kokkot ‘to dig’ 


Second, Bagobo-Klata geminates also developed from clusters of non/identical consonants through 
vowel syncope followed by regressive or progressive assimilation, as in (81). 


(81) PMP *qalejaw > ?oddow ‘day, sun (*1j > dd)’ 
PMP *qahelu > ?allu ‘pestle (*hl > 11)’ 
PMP * bageqan > baggay ‘molar tooth (*gq > gg)’ 
PMP *sanelaR > ‘halla to fry rice (*nl > Il)’ 
PMP *tugelan > tulla “bone (*ql > 11)’ 
PMP *tuq(e)lid > tullid ‘straight (*ql > ll)’ 
PWMP *benehig > binni? seedling (*nh > nn) 
PWMP *sen(e)qaw > haynow ‘steam, vapor (*nq >)’ 
PWMP *taq(e)ban > tabbar tasteless, bland (*qb > bb)’ 
PPh *bulbul > bubbu ‘hair (of one’s body) (*lb > bb)’ 
PPh *luh(e)naw > /unnow ‘green, as vegetation (*hn > nn)’ 


Third, as seen in 8.5, PMP singletons became geminates usually after PMP *e and sometimes before 
and between PMP *e 


(82) *e_ 
PAN *lemek > /ommoak ‘soft’ 
PMP *bener > bonna ‘true’ 
PMP *betem > bottom millet’ 
PMP *gelet > Pallot ‘between’ 
PMP *tebuh > tabbu sugarcane’ 
PMP *tenaq > tanya ‘half? 
PPh *Rebag > labba ‘to collapse’ 
PPh *tebaR > toppoa ‘to answer (*R > ¢)’ 


(83) _*e 
PMP *dalem > dallom ‘inside’ 
PMP *liteg > /etta? ‘sap (of a tree)’ 
PMP *qulej > Pullod ‘larva’ 
PMP *piqek > piyyak ‘baby chick’ 
PWMP *qali-matek > mattok ‘leech’ 
PSP *parek > pallok ‘sand’ 
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It must be noted that not all PMP singletons before, after, and between PMP *e became geminates in 
Bagobo-Klata, as shown in (84). 


(84) PMP *benel > bayoa ‘deaf’ 
PMP *etaq > rata? ‘rice bran’ 
PMP *heyup > Pijup ‘to blow, using one’s mouth’ 
PMP *lisehagq > liha? ‘nit, egg of a hair louse’ 
PMP *pena > paya? ‘branch (of a tree)’ 


Finally, there are quite a few Bagobo-Klata words with geminates, which cannot be accounted for by 
these sources, as shown in (85), although the singletons in *aNay, *duma, *kita, *talaw, *qalima, *baba, 
*hikam, *iak, and *lafia probably geminated because of the merger of PMP *e and *a. Another curious 
case here is that of PMP *b and PPh *b, both of which became /pp/, as in bappa < PMP *baba and 
?apput < PPh *qabut. 


(85) PAN *aNay > Pannoaj ‘termite’ 
PAN *duma > dummoa ‘other’ 
PAN *kita > kitta ‘to see, to show’ 
PAN *pitu > pittu ‘seven’ 
PAN *qalima > limmo ‘hand (c.f., PAN *lima > limo five)’ 
PAN *talaw > tallow ‘fear’ 
PMP *baba > bappa ‘to piggyback somebody’ 
PMP *buni > ‘bunni to hide something’ 
PMP *iak > Piyvak ‘to cry out ‘ 
PMP *hikam > ?ikkam ‘mat (for sleeping)’ 
PMP *kutu > kuttu ‘louse (of one's head)’ 
PMP *lafia > lanna ‘cooking oil’ 
PMP *qulu > Pullu ‘head’ 
PMP *qupa > ?uppa ‘hen’ 
PMP *pulug > pullu? ‘ten’ 
PMP *putiq > putti? ‘white’ 
PMP *suRug > hullu ‘to command, to send on an errand’ 
PPh *piqpiq > peppe? ‘to wash clothes’ 
PPh *qabut > Papput ‘to reach something’ 
PSP *bunal > buna ‘to beat somebody up’ 
PSP *danaw > danyow ‘handspan’ 


9 Conclusion 

Synchronically, Bagobo-Klata has 20 phonemic speech sounds—15 consonants and five vowels. 
Besides speech sounds, geminates, consonant clusters, and segmental stress are shown to be contrastive. 
Moreover, geminates and consonant clusters both occur in roots and are derived through 
morphophonological processes. These are crucial in understanding the internal structure of Bagobo- 
Klata words. Further data will help unravel and elucidate the complicated nature of the fortified forms 
of the glottal stop as well as the epenthetic forms triggered by the affixes -a and -o. Finally, by tracing 
the historical development of Bagobo-Klata phonemes from Proto-Malayo-Polynesian, the synchronic 
aspect of this phonological description has been calibrated to shed light on some of synchronic facts 
that can be sufficiently explained by historical facts. 
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Appendix A: Bagobo-Klata 1,026-Word List 


Bagobo-Klata English 
/p/ ‘pa:.tuk duck 
pa.ka.’ wan cup 
pa.’ Pet bitter 
‘pa:.?o foot 
‘pa:.na? bow 
pa.ha.gi.ra.paw.' wo youngest child in a family 
pa. lad lim.'mo palm (of one’s hand) 
pa. lad 'pa:.?o sole (of one’s foot) 
pa. li? wound, injury 
pa.'ja big 
‘pa: jun umbrella 
pe. ta? wet 
‘pe:.ko? 'pa:.?o back of knee 
pe. Ped to carry, bring, take with oneself 
pe. Pow lame, crippled 
pi.'lo how many, how much 
pi.jas.'su spear 
po still, yet 
"po:.toj quarrel; to fight someone 
po.'toj to die; to kill someone 
po.ko.'log how 
‘po:. PEs to open something 
po:.na? bait 
po.'no? branch 
po. los many 
‘pu:.tow iron (of metal) 
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/b/ 


Bagobo-Klata 
pi. hit 

‘pi.hi? 

pu. Puy 
‘pu:.hu? 

‘pu:.hu nen ‘bu:.lig 
‘pu: .hun 
‘pu:.lut 

pu. wa:.las 
pak.'sul 

pan. ‘tug 

pab. ‘bak 

pat. tig 
pad.'de? 
pad.'do 

pan. ‘nek 
pay.'na 

pal.'lok 

pas. sik 

pep. pe? 

pip. pis 

pit.'tu 

pis.'so? 

pi. jak 

pot. 'toj 

pod.'dos 
pon.'no? 
pon.'nu? 
pon. no 

por. nu:.?o 
pol.lan.'na? 
pol.'lod 

pud.'du 

pul. ‘lu? 
pul.'lu? ho. ‘tu 
pul.’ lu? ?uw.' wo 
pul.'lu? tol.'lu 
pul.'lu? Pap. 'pat 
pul.'lu? lim.’mo 
pus.’ sod 

put. 'ti? 

put. tin Pob.’ buk 
‘pla:.ta 
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English 

shrimp 

to pinch 

to wrap something up 
coconut heart 

banana heart 

heart 

to pick something up 
forest, woods 

a hole in the ground 
bladder 

frog 

to fracture, as a twig or one’s arm 
stinging pain 

price, debt 

to climb, as a mountain 
to miss someone 
sand 

also, too 

to wash clothes 

small (object) 

seven 

boil, abscess; to press or crush something 
chick, baby fowl 
firefly 

painful, sick 

full, to the brim 

turtle 

to finish; after 
chieftain 

born 

acrid, tart, sharp flavor 
gall bladder 

ten 

eleven 

twelve 

thirteen 

fourteen 

fifteen 

navel 

white 

gray hair 

silver 

wings 

phlegm 

can, could (ability) 


boat, larger than a canoe 
to shut, as a door 

cough 

bolo 

lungs 

to throw, toss, as a stone 
smell (general) 
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Bagobo-Klata English 

‘ba:.?u ‘Pe:.ma? smell (as of underarm, body odor) 

ba.’ Pow north 

‘ba: .how thirsty 

ba. 'li:.ti? banyan tree 

ba.'li: jun axe 

‘ba:.sa to read 

ba.'sa pumpkin, squash 

ba.'ra:.tu cheap, inexpensive 

ba.'ra:.su? upper arm 

‘ba:.lu provisions (as for a journey) 

‘ba: jad to pay 

‘ba: jaw brother-in-law 

ba.'ju? cheeks 

‘ba:.we? medicine 

be ‘la:.gat ocean 

be pat. 'tad plain 

‘be:.kon coconut shell 

be.' kos to shoot, as an arrow from a bow 

‘be: .ne? to cry 

‘be: le to stay, remain 

be.'led river 

‘bi:.ta? tadpole 

‘bo:.tu testicles 

‘bo:.n1? rice, burnt lower crust 

bo.'n9 deaf 

bo. not pubic hair 

bo.'nu:.lo insane, mentally ill 

‘bo: .how arrow 

bo:.la? bubbles, foam 

‘bo:.log blind 

bo.'loj house 

bo.'jo? face 

bo.so.'mo? broken, not working or out of order 

‘bo:.wo? to pour, as liquid 

bu. ‘tun coconut palm 

bu.'na to hit someone with something 

bu. ‘nut coconut husk 

‘bu:.go to cook something covered in ash 

‘bu:.la moon 

‘bu:.lak flower 

‘bu: lig banana 

bu. ‘li:.joj drunk, intoxicated 

‘bu:.lu to cut something (e.g., hair) with scissors 

‘bu:.lud mountain 

bu.’sow stout mythical creatures that are believed to 
take children 

bu.’ wa? lanzones fruit 

ban. '‘tu:.gan famous 

ban. ‘ta? threat 

ban. ka? canoe 

ban.'kil canine tooth 

bab.'ba broken in pieces 

bap.'pa to carry someone on one’s back 
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/t/ 


Bagobo-Klata 
bag.’ gan 
bal.'las 
bal.'len 
bal.'lo 
bal.‘lu 
baj.'jo 
baw.’ wow 
bik.'ko 
bin. ni? 
bil.'lod 
bob.'bo 
bot.ti.'ja 
bot.'tom 
bok.ka.'les 
bok.'ko? 
bok.'kon 
bok.’ kos 
bog.got. tow 
bol.'li 
bol.'lo 
bol.'loj 
bon.'na 
bon.no.' Pos 
bos.'sak 
bol.'la:.lu 
bub.'bu 
bun. ‘ni 
bun.')9 
bul. ‘las 
bul.'li 
buw.' wa 
‘bla:.wa 
ble: Pe 
‘bli:.bud 
‘bli:.la 
‘blo:.wa? 
‘*blu:.buk 
bja.' Pow 
‘bja:. 29 
blab.'ba? 
blun.'nus 


ta 

ta.'pe 
‘ta:.pi? 
‘ta:.do? 
ta.gis.’ kwi:.la 
ta.la.tu.'lu 
ta.'lum 
ta.ma.' wet 
ta.'me:.ya 
ta.'m1:.toj 
ta.'mi:.hin 
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English 

jaw, molar tooth 

rice (uncooked) 

to return home 
embers, glowing coals 
widow 

crocodile 

high (object) 

necklace 

rice seed 

semen 

hole 

pregnant 

(foxtail) millet 

wound (scratch) 

not (constituent negation) 
shin (of one’s leg) 
fruit bunch (in a basket) 
adopted (child) 

to buy something 
poison 

to give something to someone 
true; to believe in someone 
loud, noisy 

earth, soil 

earthquake 

body hair; feather 

to hide 

fruit 

to change something 
night, evening 
hammock 

gold 

gecko 

whirlpool 

rafter 

spider 

dust 

drizzle, light rain 

year 

rice beer 

mouth 


in, on, at (oblique nominal marker) 
twin 

wall 

to drip, as from a faucet 

school 

teacher 

papaya 

hook, from which to hang something 
stove 

small bridge 

ring finger 
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Bagobo-Klata 


ta.'mi:.luy 
‘ta:.lu 

ta.‘ ro] 
‘ta:.wan 
‘ta:.wis 
tak.pul.'lu? 


tak.pul.'lu? ?uw.' wo 


tak.tol.'lu 
tak. hi.'jow 
tak. lap.’ pat 
tak.li.'mo 
tak.lon.'nom 
tak.luw.'wo 
tak.wo.'lu 
te.'kod 
‘ter. PE 
‘tes. Po 
‘ti:.toj 
‘ti:.tu? 
‘ti.nut 

ti. Pi 

‘tL. Pow 
ti.'ja 

to.'po 
to.pod.'don 
to.bu. ‘lug 
‘to:.do 

to.’ Pi 
to.'muk 
‘to:.o 

‘to: .hu 
to.'li 
to.'lom 

to.’ wo? 
‘to:.wu? 
‘tu:.ban 
‘tu:.big 
‘tu:.di? 
‘tu:.duk 
‘tu:.kuk 

tu.’ Pud 

‘tur. uw 
tu.'la:.gon 
‘tu:.li? 
tu.'‘lu 
tu.'re?. kan 
‘tu.wo 
‘tu.wu 
tab.'la 
tam. 'ba:.ga 


tim. "pu mu. ‘la 


tug. sip 
tap.’ pe 
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English 

shelter 

to defecate 

to hang up, as on a hook 
to help 

trousers (female) 
tenth 

twelfth 

third 

ninth 

fourth 

fifth 

sixth 

second 

eighth 

heel 

hard 

Job’s tears 
bridge 

puppy 

greedy 

little finger 

clear (as water) 
abdomen, belly 
to broil 

quiet, silent 
round 

to follow, as a trail 
to accompany 
mosquito 

nipple 

cucumber 

string 

sharp 

fat 

smart, intelligent 
in front 

water 

vagina 

oar 

lazy 

deer 

to string something together 
spine, backbone 
earwax 

to teach 

X do not know 
old (person) 

to grow 

board, wood plank 
copper 

rainy season 
bedbug 

old (object) 
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Bagobo-Klata 
tab. bay 
tak.'ka 
tak.'ke? 

tag. ge? 
tan.'no? 
tay.'na 

tal.’ low 

tep.' po? 
ted.'do? 
tib.ba.pul.'lu? 
tib.ba.tol.'lu 
tib.ba. Pap. ‘pat 
tib.ba. uw.’ wo 
tib.ba.ho.'tu 
tib.ba.li.'mo 
tid.'do 

tid.'dos 

tik."ka 

tig.’ gan 

tig. gu? 
tim.'mos 

top. po? 


top. po:.kan nen ?ul.'lu 


tob. bi 

tob. ‘bin 

tob. bok 
tob.'bo? 

tob. bu 

tob. bus 
tot.tot.' tow 
tot.' tok 
tot.'to:.lu 
tok.'ko:. i 
tok. kub 
tog.gan.'ga? 
tod.'doj 
tod.'du:.wa 
tom.ma.'na 
ton. nob 

ton. No 

ton. non bul.'li 
tos.'so:.tu 
tol.'lu 
tol.'lu pul. ‘lu? 
tol.‘lu:.wo 
tuk.'kod 
tud.'du? 
tug.’ go 

tug. guy 
tul.'la 
tul.'lid 

tuw.' wan 
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English 

bland 

to arrive 

to laugh 

urine 

snot, nasal mucus 
home 

to fear 

to spill, throw 

index finger; to point at someone 
only ten 

only three 

only four 

only two 

only one 

only five 

to stand 

to crush lice with one’s fingernail 
dry; to dry in the sun 
to cook rice 

to hit the mark 

to clean something 
to answer, reply 
headhunting 

to sew 

buttocks 

to stab 

coconut fruit (young) 
sugarcane 

to redeem 

self 

house lizard 

three each 

language 

to cover, put a cover on something 
parent 

fruit bunch (still on a tree) 
ancestors 

want (modal) 

honey 

half 

midnight 

one each 

three 

thirty 

two each 

cane, walking stick 
nape 

post, house pole 

to increase 

bone 

straight 

to drop, fall down 
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/d/ 


Bagobo-Klata 
da.’ gow 
‘da:. Pit 
‘da:.?u 
‘da:.mu 
‘da:.la 
da.'luy 
dajt 
da.'jo 
‘de:.la? 
‘de: ja 
‘di:.pa ‘la:.gat 
‘di:.ta 
‘do:.?os 
‘do:.mo 
‘do: now 
do.'lid 
du. ‘li:.ja 
‘do:.lon 
‘du:.pu 
‘du:.lu? 
dat. 'toj 
dan. now 
dip.'po 
dit.'tu 
dik.'kot 
dim.’ mot 
dis.'so? 
dod.‘ don 
don. ‘nak 
dal.'lom 
dol.'lom 
dug.’ g9j 
dum.'mo 
dun. nuk 


‘ka:.pen 
‘ka:.pe? 
‘ka:.po 
‘ka:.bog 
‘ka:.tig 
‘ka:.da 

ka.’ ?a 
ka.?i.'la:.na 
ka.’ Pob 
ka.'si.du 
ka.hi.'ra:.pa 
ka.'la? 
ka.li.ban. bay 
ka.'ja? 
ka.'jang 
‘ka:.wad 
‘ke: led 
‘ke:.wo? 
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English 

short (of time) 

name 

leaf 

dew 

path, trail 

low 

what 

west 

tongue 

to turn, as around a corner 
island 

name 

lime for betel nut chewing 
to remember 

to wake someone up 

root 

durian (Durio zibethinus) 
to honor, praise 

to chase away 

blood 

to carry on one’s shoulder 
handspan 

armspan 

to come near the speaker 
to stick, adhere to 

rice bran 

sty (in one’s eye) 

near; to approach someone 
broom 

inside; content 

heavy 

long time; slow 

other 

flood 


split bamboo 

to hold, grasp 

cold season, weather 
fruit bat 

outrigger 

each 

to eat 

need, must 

to lie face down 
knife (for all genders) 
difficult 

frying pan 

butterfly 

to lie on one’s back 
bright 

barb (as a fish hook) 
to lie on side 

short (of a person) 
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Bagobo-Klata 
‘ki:.bot 

ki. 'tik 

‘ki: lat 

ki. ‘lid be.'led 
ki.'lid ‘la:.gat 
ki.'rop 
ko.to.'‘le:.bes 
ko.’mot 
‘ko:.no 
‘ko:.het 
ko.'lat 

‘ko: les 
ko.luk.'ku? 
‘ko:.li 
‘ko:.woj 
‘ku:.pa 
‘ku:.da? 
‘ku:.kan 

ku. 'li:.sap 
‘ku: lat 

‘ku: lit 
‘kuwlit ‘hu:.?uy 
‘ku:.luy 

ku. lun 

kam. ‘bin 
kas.'pa 
kum. ' bin 
kap.'pey 
kap.'pi 

kad. '‘di:.ru 
kak.'kas 
kal. ‘lay 
kaj.'ju 
kek.'kes 
kej. je 
kit.'to 

kid. '‘dut 

kik. ‘kis 
kin.'na 
kin.'na ‘bo:.gok 
kil. li 
kop.’ po 
kot.tol.'lu 
kot.'tu? 

kod. 'da:.la 
kod.’ dut 
kok.’ kop 
kok. kot 
kol.'la:.pat 
kol.'li:.mo 
kol.'lot 
kol.'los 
kol.'lu:.wo 
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English 

civet 

small (as a young person) 
lightning 

riverbank 

seashore 

to blink one’s eyes 
diarrhea 

blanket 

rice (cooked) 

cave bat 

thin (of a person) 

to write 

dirty 

to say 

snail 

thick 

horse 

cockroach 

food particles stuck between teeth 
mushroom 

skin 

lip 

to snore 

back 

goat 

dandruff 

Jew’s harp 

to split into halves 
Philippine eagle 
cauldron 

to scratch up the soil, as a chicken 
scab 

tree 

to scrub the floor 
absolutive medial demonstrative (free) 
to see; to show 
astonishment 

to scrape, as a coconut 
have (existential verb) 
sick 

eel 

chest 

three times 

absolutive distal demonstrative (free) 
knowledge 

to pinch someone 

to embrace 

to dig 

four times 

five times 

curly (hair) 

strong 

twice 
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/g/ 


Bagobo-Klata 
kom.'mi 
kom.'mon 
kom.'mos 

kon. ‘ni 
kon.'no 


kon.nod.'do? yo ‘la:.?i 
kon.nod.'do no ‘li:.bu 


kon. no 
ko.nod.' dow 
kos.'so 
kuk.'kus 
kut.'tu 
kul.'lo 
kla.’mat mo.'to 
‘kle:.wen 
‘kli:.tut 
‘kli:.gi? 
klo.'ni 
‘klu:.go? 
‘kjo:.hon 
kwa.'lo 
‘kwa:.ja 
klak.'kam 
klat. ‘tan 
klut.'tun 
klam.'mag 
klob.' bow 
klom.'mo? 
kjuw.'wa 


ga.’ bas 
ga.'ta? 

‘ga: how 

ga. hu? 

‘ga: li? 
‘ganas 
‘ge:.not 
‘ge:.hot 

ge. le? 
‘ge:.lok 
‘gi:.bo? 
‘gi:.bud 
gi.bul.'‘li 
gi.'ra 
gi.ra.paw.' wo 
gi.'ra nen ban.’ka? 
‘go:.bo 
go.'ton 
‘go:.la? 

‘gu: .hin 
gat.'to 
gak.'ka 
gaj.'jo 49 Pon.'nus 
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English 

beard 

fist 

to wring 
absolutive proximal demonstrative (free) 
dream 

son-in-law 
daughter-in-law 
later, in a while 
yesterday 

once 

to scrub one’s body 
head louse 

pot for cooking 
eyelashes 
eyebrows 

anus 

hawk 

now 

wart 

crab 

earthworm 
bamboo 
ringworm, scabies 
a step in a ladder 
forehead 

star 

water buffalo 
tomorrow 

bee 


saw (tool) 

coconut milk 

ant 

easy 

to do, make 

door 

slow 

domestic pig 
correct 

tickle 

south 

fontanelle, hair whorl 
last night 

last 

younger sibling 
stern, back of boat 
to lie 

eggplant 

to sell 

water jar 

itch(y) 

to deceive someone 
wind from the south 
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/?/ 


Bagobo-Klata 
gam.'mi 
gen.'no 
gog.' got 
gol.'lot 
‘gka:.wan 
‘gna:.lo 
‘gla:.?u 
gla.’ ?ud 
‘glo:.puj 
‘gja:.wat 
gwal.'li 


‘Pa:.pay 
?a.pi. li:.du 
‘?a:.tad 
‘?a:.ta? 
‘?a:.tin 

?a. tin 
‘?a:.tu 

?a.' kap 


?a. gad ke.'la ma 


“Pa:.hos 
Pa. Pog 
Pa. Pow 
?a.'T1: jus 
Paw. wo: 
“Pa:.wa 
“?Pa:.wak 
"Pe:.pa 
‘?e:.pok 
‘?e:.ban 


?i. no 

?i.'no gwal.'li 
?i.. num 
‘Pina 

?i:.hi 

‘?i:.hip 
?i:.hin 
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English 

to start 

earlier, recently 
a sharp weapon 
middle finger 
hip 

throat 

Adam’s apple 
east 

bee 

pimple 

across, opposite 


locust 

surname 

raft 

rice husk 

sweat 

if (conditional) 

to revenge 

monkey 

never 

garlic 

to take a bath 

to enter 

earring 

to weave 

rainbow 

waist 

to cross a river, road, etc. 
to cut off, sever, as a piece of rope 
left 

calf, of one’s leg 

to escort, bring someone somewhere 
tail 

feces 

yes 

to sit 

armpit 

turbid, murky; mud 
wasp 

to count 

to take care of someone/something 
sister-in-law 

to sleep 

betel leaf 

hot (as weather) 
mother 

aunt 

to drink 

envious 

to whet a blade 

to whisper 

ring 
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Bagobo-Klata English 

?i.'la? to lie down 

?i.'las to slice (e.g., meat) 

?i.'lis to move to another location 
?i.' lus to massage 

Pi. ja? do not (negative imperative) 
?i:.Jup to blow, using one’s mouth 
Pi. jug coconut fruit (mature) 

‘Pi. Wo saliva 

29 or 

"?o:.puj flame 

?o.' bow mouse, rat 

"?o:.ta? to vomit 

?o.'top roof, thatch 

?o.'toj liver 

?o.'ka what-you-may-call-it 

?o. kap tree bark 

‘Po:.men to cook 

?o.mo father 

?o.'mo gwal.'li uncle 

‘?o:.hap to chew (sugarcane), bite off 
?o. hi salt 

‘?o:.hot rice straw 

“?o:. lat scar 

29. lat vein, root 

?o. le? and 

29.’ to choose, select 

?o. lok to kiss 

?o. low fence 

?o. lun to speak to someone 
‘Po:.jam yawn; to yawn 

?o. wak crow 

?o.’wu ash 

?u. pow bald 

“?u:.pus cat 

“Pu:.bi? to ask for, request 

Pu:.ti penis 

‘?u:.gan co-parent 

‘Pu:.gan ‘la:.?i father-in-law 

‘Pu:.gan ‘li:.bu mother-in-law 

“?u:.ho wild boar 

‘Pu:.la snake 

?u.'la rain 

Pu. lob knee; to kneel 

“Pu: lit to repeat; to do something again 
‘Pu: lin charcoal 

‘Pula to call 

?u. woj rattan 

Pan. ‘tap to think 

Pan.'da? not have (negative existential) 
?an.'da? non.’ non stupid 

?im.'po thing 

Pin.’ di? not (standard negation) 

Pin. di? to.'lom dull, blunt 
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Bagobo-Klata 
?in.'di? ku.'wan 
?om.'bo? 

Pap. ‘pat 

Pap. ‘put 

Pap. pu? ?u.'lob 
Pap. pu? ‘la:.?i 
Pap. pu? ‘li:.bu 
Pas.’ su 

?at.' tap 
?ad.'da 
?ak.’kat 
?ak.'kan 

Pal. 'loj 
?am.'mi 
?am.'mu? 

Pan. 'noj 

Pan. nat 

Pan. na? 

Pan. na? gwal.'li 
Pan. net 
Pan. 9] 
?al.'los 

?as.'so 

?el.'los 

Pej. je 

?ip.po 

Pip. pod 

Pip. pod no ‘la:.?i 
?ip. pus 
?it.'tom 

?ik.' kam 

?ik.' kot 

?id.'di 

?id. dun 
Pin. Nd 

?il.'lo? 

?1j. jak 

Pop. po? 

Pop. pos 
Pop. pu 

Pop. pun 

Pop.’ pu Pu.'lob 
?ob. ‘bid 
?ob.'bo 
?ob.'buk 
Pot.'tad 

Pot. ‘tok 

Pot. tow 

Pot. tu 

Pot. ‘tut 
Pot.'tu? 

Pod. 'de 

Pod. 'dok 
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English 

hoarse (of one’s voice) 
do not like (negative desiderative) 
four 

to reach something 
great-grandparent 
grandfather 
grandmother 

dog 

winnow, sift rice 
ladder 

to climb, as a tree 

owl 

chin 

right 

breast; to suck, breastfeed 
termite 

to wait for somebody 
baby 

nephew, niece 

to bite 

to get something 

to flow 

fish poison 

to masturbate 
absolutive medial demonstrative (bound) 
tooth 

spouse 

husband 

gasp, paint 

black 

mat, for sleeping 

rope; to tether an animal 
to boil, as water 

nose 

deaf 

mole, birthmark 

to shout loudly 
grasshopper 

to borrow 

thumb; grandchild 

to smell something 
great-grandchild 

roe, fish eggs 

smoke, as from fire 
hair 

to separate 

brain 

person 

to harvest 

flatulence 

absolutive distal demonstrative (bound) 
unripe (of a fruit) 

to pound, as rice 


278 


/m/ 


Bagobo-Klata 
?od.' dow 

Pog. get 

Pog. gi? 
Pom.'mot 
?om.'m9j 
Pon.'nom 
?on.'nom pul. ‘lu? 
Pol. ‘lik 

Pol. lob 

Pol. ‘lot 

Pol. lon 

Pol.'lu 

Pon. ni 

Pon. ‘nop 

Pon. nus 

?on. ‘nus gid.'du ta ‘la:.gat 
Pup. pat 

Pup. po 

Pub. ‘bi 

Puk.'kat 

Pun.’no 
?un.no.paw. wo 
?os.'som 
Pul.‘lod 

Pul.'lu 

?ul.'lud 

Pus.'sa? 

Pus.’ sin 

uw. wi 

uw. wo 

?uw. wo pul.'lu? 
Puw. wo Pul.'li klo.'ni 
?uw. wo lo. ‘tus 


‘ma:.?a 

ma. ‘ha 
‘ma:.me 
‘ma:.si 
‘ma:.las 
‘ma:.li? 
ma.li.'ba:.?u 
ma.li.gon.'noj 
ma.lina.'nam 
ma. li:.nis 
ma. ‘ni? 
‘ma:.ray 
‘ma:.jow 
‘me:.ya 
‘mo:.gat 

mo. pow 

mo. to 

mo. ‘han 

mo. ?9 
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English 

day, sun 

clothes 

cogon grass 

to catch 

rice plant, unhusked rice 
SIX 

sixty 

to thresh 

spring (of water) 
between 

neck 

pestle 

absolutive proximal demonstrative (bound) 
fog, mist 

wind 

monsoon wind 

to jump over, leap 
hen 

spiderweb, cobweb 
to strike with intention to hurt 
first 

older sibling 

sour; pomelo fruit 
larva 

head 

floor 

to put, place 

soot 

(purple) yam 

two 

twenty 

day after tomorrow 
two hundred 


sheath 

expensive 

leg 

salty 

spicy 

good 

fragrant 

beautiful, pretty, attractive 
beautiful 

clean 

peanut 

marang fruit 
raincloud 

fireplace 

mote, dirt in one’s eye 
shallow 

eye 

betel chew, quid 
betel, areca nut 


279 


/n/ 


/y/ 


Bagobo-Klata 

mo. nuk 

mo.'lo 

mo. ‘jo 

mu.’ ?ud 

mu.’ ?ud nen no. 'toj 
‘mu:.wu ken ‘la:.gat 
mar.’ ga 

map.'poy ?od.'dow 
mab. ‘bag 

mat. 'tok 

MES.’ SE} 

mob. ‘but 

mom. 'mis 

mon. ‘ni 

mos.'sod 

mos.'sok ?Pod.'dow 
mol.'lon 

mud.'du ney ‘pa:.?9 
mum.'m9 

mul.'lo 

‘mle:.?an 

mlo.'tus 

‘mlo:.?os 


‘na:.ta 
‘na:.do 
na.’ go? 
‘nat. ?e 
‘na:.Pow 
‘na:.?u 
‘na!.na 
na.’ nap 
na.'nam 
n€.' pes 
‘ni:.pa? 
ni:.dom 
no.'toj 
no. Pos 
‘no:.no 
“nu:.du 
nu.'ma 
nu. ‘lit yo Pod.'dow 
nu. wo 
naw. wo 
nej. j€ 
nob.'bo 
nod.'do? 
nom.'mo 
nuw. was 


‘qu:.?ob 
‘Qu:.?uj 
‘u:.ma 
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English 

chicken 

light (in weight) 
young woman, lass 
spirit, soul 

spirit, of dead person 
high tide 

mango 

sunrise 

swollen, swelling 
leech 

industrious, hardworking 
animal 

sweet 

smell (as of fish) 
smell (as of urine) 
sunset 

dry season 

toe finger 

crumbs (i.e., rice or food) 
to plant 

rough, coarse 

one hundred 

smooth 


vegetable 

naked, bare 

mine 

intestine 

to steal 

to descend a hill 

to leave behind, abandon 

to crawl 

taste 

thin (of an object) 

nipa palm 

to think about something carefully 
corpse 

noise 

lobster, prawn 

why 

why 

day before yesterday 

to hear, listen 

to love someone; to breathe 
genitive distal demonstrative 
field 

to live at, reside at 

to search for 

accustomed 


fingernail 


whistle 
story 
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/s/ 


Bagobo-Klata 
nit.'ton 
nil.'lo 


sa.lu.'wa 
si.'la? 

si. ‘la? kaj.‘ju 
"$01. Pa 
sa.lup.'pan 
saj. jow 

Sos. Sop 


‘ha:.pu? 
ha.' but 
‘ha:.koj 
ha.’ go? 
‘ha: lud 
ha.'jup 
ha.' wet 
‘he:.le 
‘hi:.pa? 
hi.'kat 
hi.'to 
hi.'la 
‘hi:.mat 
hi.'ja 
hi.'jow 
hi.'ju 

ho 

ho.’ po:.toj 
ho.’ bow 
ho.to.' Pi 
ho.'tu 
ho.'tuy ‘mla:.la 
‘ho:.?a 
ho.m9?.'ba:.?u 
ho.mo.bo.'jo? 
ho.'mo? 
ho.'’mu 
‘ho:.lo 
ho.' £90? 
“hu:.ki? 
‘hu:.ko 
‘hu:.du? 
‘hu:.9j 
‘hu:.lat 
‘hu:.lud 
hu.' wat 
hu. win 
hap.’ p9j 
had.'do 
ham.'mo 
hid.'du? 
hik.'ko 
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English 
dark 
ear 


trousers (male) 
sweet potato 
cassava 

ankle 

loincloth, g-string 
to dance 

to suck, sip 


meat 

sieve, strainer 

to ride 

I 

to catch water with a pail 
wrong, incorrect 

to hang something up, not on a hook 
to lean 

to kick 

fast, quick 

we (inclusive) 

they; absolutive plural nominal marker 
needle 

s/he 

nine 

elbow 

absolutive singular nominal marker 
enemy 

(breast)milk 

companion 

one 

one thousand 

thorn 

bad smell, stink 

ugly 

bad, evil 

you (plural) 

[ge 

to stop 

bamboo water container 

anger, angry 

to carry on one’s head 

horn (of an animal) 

to write something 

to comb for lice with a fine tooth comb 
to comb one’s hair 

beak 

to hang clothes on a clothesline 
to know 

we (exclusive) 

to spit 

you (singular) 
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Bagobo-Klata 
hin. 

his.’ sip 
his.'sib 

hob.' be? 

hob. ‘bot 
hon.'ni? 

hon. now 
hud.'du? 
hud.du.'?a 


hup.'pu ney ‘bu:.lud 


hal.'la 
hug.'gu 
hul.'lu 


‘la:.ba? 
‘la:. bid 
‘la:.ta 
la.'ti? 
‘la:.gat 
‘las. ?i 
la.’ Pot 
‘la:.?u 
‘la: nu 
‘la:.nit 
‘la:.qub 
la. hat 
‘la: lak 
la. ‘le 
‘Ja: lig 
‘la: jay 
‘la:.was 


‘la:.was nen kaj.’ju 


la.’ we 

‘la:.wu 

la.’wu 

le."pes 

‘le:.pos 

‘le:.pos gwal.'li 


li.’mo 

li.'mo pul. ‘lu? 
‘li:.mut 
‘li.now 

‘Tis jun 

lo. ‘tus 

lo:.kat 
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English 

to blow one’s nose 
louse (of a chicken) 
fat, grease 

dry stream 

grass 

knife (for all purposes) 
steam, vapor 

young man, lad 
hiccough 

summit 

to fry rice 

to push 

to command, send on an errand 


brave, fearless 
ringworm 

taro 

swamp, wetland 

sea 

man, male 

to carry with both hands, as a heavy load 
to bail water, as from a canoe 
sad, unhappy 

sky 

cave 

all 

friend 

to run 

happy, joyful 

to fly 

body 

tree trunk 

high (of a person) 
cloud 

to fall (from a higher place) 
knife (for women) 
sibling 

cousin 

yellow 

waterfall 

torch 

narrow 

woman, female 

fire, to burn 

louse egg 

to look back 

five 

fifty 

to forget 

lake 

to turn around 
hundred 

to open up one’s eyes 
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Bagobo-Klata 
‘lo:.ku? 
lo.’ Pow 
‘lo:.ma? 
lo.'met 
‘lo: nod 
lo."quj 
‘lo:.la 
‘lo: job 
‘lo:.jod 
lo.’ wa 
lu.'tow 
‘lu:.tu? 
‘lu:.ka 
lu.'ga 
‘lu: .huk 
‘lu:.mut 
‘lu:.na 
‘Tu:.na 
lu.'lu 
‘lu:.jo 
‘lu:.wa 
‘Tu:.qu 
lug.'wa? 
lab.'bus 
lak.'ka? 
lag.’ gam 
lag.’ gas 
lam.'ma 
lam.'mas 
lam.'mi 
lan.'na 
lan.'no 
lan.'1ow 
lal.‘ lad 
lal.'lom 
laj.'ju 
law.’ wow 
let.'ta? 
les.’ set 
les.'set ken ‘hu:.?uy 
les.'sen 
lip.’ po 
lid.'do 
lik.'kut 
lit.'ti 
lig.’ ga? 
lim.'mo 
lin.’ nu 
lis.'so 
lop.’ puk 
lob.'bo 
lob. ‘bon 
lot.'tu kod.’ dow 
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English 

shirt 

to walk 

knife (of a datu) 

to burp, belch 

to come back 

to swim 

nest 

weak 

to pull, drag 

wide 

to float 

ripe 

bowl 

place 

ribs 

moss, algae 
placenta (afterbirth), pillow 
shade 

to broil something over coals 
ginger 

ladle, as of coconut shell 
coffin 

to go out, exit 

poor 

jackfruit 

bird 

to wash one’s face or hands or dishes 
fever 

to sow seeds 

new 

pus 

cooking oil 
housefly 

to move (be in motion, not be still) 
deep 

far 

tear(drop)s 

sap, resin 

to tear, rip 

harelip 

to annoy someone 
mold, mildew 
slippery 

tight 

thunder 

red 

hand 

shadow 

sesame 

all gone, consumed 
to collapse, cave-in 
to bury, inter 

noon, midday 
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iil 


/w/ 


Bagobo-Klata 
lod.' dog 
lok.'ko 
lok.’ kod 
lom.'mod 
lom.'mo? 
lon. not 
lon.'na? 
los.'suy 
lud.'do 
lud.'duy 
lud.'dus 
lun. ‘nod 
lun. ‘now 
lul.'‘lut 
lus.’ suk 
lus.'suj 
luw.' wu 
lug.'wa? 


‘ju:.pa 
jab.ba?.'na:.pu 
joj. Jo 


‘wa:.loj 
wi. ti 
"W9!.?9 
wo. lu 
wal.'lo 
wod.' dow 
wos. SU 
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English 

rotten 

granary 

to blame; to regret, be sorry 
to swallow down 

morning 

mortar, for pounding grain 
to flee, run away 

to ask a question 

cotton 

to lose one’s way 

fish 

to drown 

green 

to cook something in bamboo 
bead (as for jewelry) 

green 

winnowing basket 

outside 


centipede 
nightmare 
shame 


tired, weary 
satiated 

abaca fibers, hemp 
eight 

shoulder 

afternoon 

hungry 
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Abstract 

Register is a suprasegmental contrast cued by a combination of voice quality, vowel quality 
and pitch found in many languages of Southeast Asia. Conventionally, registers develop 
from the transphonologization of historical onset voicing contrast; however, there are 
several Austroasiatic languages which have developed apparent register contrasts that 
correspond to historical vowel quality contrasts. The term pseudoregister is introduced 
here to refer to such contrasts. A general model for the formation and evolution of 
pseudoregister is proposed and, as quantitative phonetic analysis of pseudoregister 
languages is sorely lacking, the results of a preliminary study on the acoustic correlates of 
pseudoregister in Pacoh (< Katuic) are presented. Fl is shown to be the only acoustic 
correlate which reliably covaries with pseudoregister in Pacoh, while subtle differences in 
FO and spectral tilt GH1*-H2*) are also detectable. 


Keywords: historical phonology, phonetics, register, Austroasiatic, Katuic 
ISO 639-3 codes: pac, tto, tth, irr, oog, sed, cog 


1 Introduction 

The Pacoh language of Vietnam and Laos has been described as a register language (Watson 1964, 
1966, 1996; Watson et al. 1979, 2013; Diffloth 1982; Sidwell 2005; Alves 2006; Gehrmann 2015). 
Register is a binary suprasegmental contrast, upheld by a combination of voice quality, vowel quality 
and pitch cues. Register, in its conventional form, develops under conditioning from the historical 
voicing specification of onset consonants and emerges in conjunction with the loss of said voicing 
contrasts (i.e., registrogenesis) (Huffman 1976). However, the distribution of register in Pacoh is 
unrelated to historical onset voicing, corresponding instead with historical vowel quality contrasts 
(Diffloth 1982, Sidwell 2005, Gehrmann 2015). As such, Pacoh, a language from the Katuic branch of 
Austroasiatic, is part of a small class of languages which I propose be called pseudoregister languages. 
Other documented pseudoregister languages include languages of the Bahnaric and Pearic branches of 
Austroasiatic and one sister language of Pacoh in the Katuic branch: Ta’oi (Sidwell 2015, 2019; 
Gehrmann 2015, 2019). 

Unfortunately, apart from the well-documented Chong language (< Pearic) (L-Thongkum 1991, 
Edmondson 1996, DiCanio 2009), acoustic phonetic descriptions of pseudoregister are sorely lacking. 
Pacoh phonology has been described qualitatively by Watson (1996), but there has been no acoustic 
phonetic investigation of the phenomenon in this language. In this paper, I will demonstrate using a 
small set of archived Pacoh word list recordings that differences of vowel height (F1) are by far the 
most reliable acoustic correlates of the Pacoh pseudoregister contrast. Nevertheless, in addition to Fl, 
there are secondary correlates, including a relatively lower fundamental frequency (FO) and spectral tilt 
(H1*-H2*) in the high pseudoregister. 

This investigation into the acoustic correlates of the pseudoregister contrast in Pacoh begins the 
process of establishing a quantitative baseline description of the Pacoh pseudoregister phenomenon. 
Furthermore, the results of this preliminary study have important implications for a general model of 
pseudoregister formation. Two primary typologies of pseudoregister are apparent in the pseudoregister 
languages described thus far, but Pacoh pseudoregister does not match either of these prototypes. A 
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general model of pseudoregister formation is proposed here, based on the patterns of pseudoregister 
emergence and evolution among all of the pseudoregister languages documented thus far. 


2 Pacoh 

Pacoh is spoken primarily in the Thtra Thién Hué province of Vietnam and adjacent areas of Salavan 
province, Laos. Speakers of Pacoh are included under the officially recognized Ta Oi ethnic group in 
Vietnam and, indeed, some speakers of Pacoh refer to their language as Ta’oi.'! The language is not 
particularly well documented apart from the dialect which is the focus of this study, High Pacoh 
(hereafter, simply Pacoh). Phonological descriptions (Watson 1964, 1966, 1996), grammatical 
descriptions (Alves 2006, 2007, 2015; Watson 1977, 2011; S. Watson 1964, 1966, 1976; Watson et al. 
2013) and a sizable dictionary (Watson et al. 2013) are available for the High Pacoh dialect. The 
phonology of the Cado dialect has been described (Gehrmann 2015, Vitrano-Wilson et al. 2018) and 
brief descriptions of two varieties referred to as Pakéh and Tadih are also available (Nguyen et al. 1986). 
Another variety, Bahi, is mentioned in passing by Watson (1996). 

Watson’s characterization of pseudoregister in Pacoh evolves over three decades from the 1960s 
to the 1990s until, in a final paper on the topic, Watson (1996) presents his conclusions on the matter. 
Having been influenced by Gregerson’s (1976, 1984) hypothesis that differential tongue root position 
underlies phonological register in Southeast Asia and by his own experience working with African ATR 
harmony languages, Watson settled on a bifurcation of Pacoh vocalism into two registers: a tongue root 
retracted (+RTR) register and a tongue root advanced (-RTR) register. The +RTR register is reportedly 
the more marked of the two and is described as tense or pharyngealized. Watson draws parallels 
between Pacoh register and register in Sedang, both of which are described as having the “... ‘creaky, 
raspy’ vowels of a retracted tongue root articulation.” (Watson 1996, 200). Hereafter, we will refer to 
the tenser, +RTR register as high pseudoregister and the laxer, -RTR register as low pseudoregister. 

Watson’s ultimate interpretation of Pacoh vocalism divided the language’s 30 vowel phonemes 
into fifteen vowel qualities doubled for register contrast. This inventory is presented in Table 1, with 
slight alterations in the spelling of phonemes compared to Watson’s (1996) inventory. 


Table 1: Pacoh vocalism 
/ ia# io! #aH io! uaF uol 
wok fF oak WF ont wok yA yk yA yl 
eH el aH al oF ob eH eb aH al oF of / 


[ ia id ta #0 ua uo 
e to &k 9: Uu e io. 24 9 U 
: ££ a + Dp oO ze € a3 vd o | 


3 Experiment 

The phonetic correlates of the proposed Pacoh pseudoregister contrast are currently unknown, apart 
from Watson’s qualitative description. Here, I present an introductory acoustic study on Pacoh 
pseudoregister, the goal of which is to confirm on empirical grounds, if possible, the pseudoregistral 
interpretation of Pacoh vocalism. The data was segmented and annotated in Praat (Boersma & Weenink 
2020) and acoustic measures were extracted using the PraatSauce script (Kirby 2020). The resulting 
data was analyzed in python and graphical representations of the data were produced using the plotnine 
package (Kibirige et al. 2021).* Unfortunately, only a small amount of data was available for this study, 
which amounts to 303 words spoken one time each in isolation by a single male speaker. The recordings 
were made on tape by Watson, working with a native speaker in 1972. Not every vowel in the Pacoh 


Not to be confused with the Ta’oi language [tto, tth, irr, oog] spoken in Laos, which is a separate Katuic 
language (Watson 1996). 
Plotnine is based on the ggplot2 package for R (Wickham 2016). 
I wish to express my gratitude to Dick Watson for sharing these audio files with me for this project. 
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vocalic inventory is represented in this data set and the general paucity of data precludes the possibility 
of addressing this issue definitively here and now. 


3.1 Vowel Quality 

Watson (1996:200) explains that that high pseudoregister vowels are phonetically more open than 
corresponding low pseudoregister vowels, and that the degree of vowel height difference “increases 
from front to back”. The close back high pseudoregister vowel /u(:)#/ is so much more open than its 
low pseudoregister counterpart, in fact, that it is found at a more open vowel height position [o(:)] than 
the low register non-close vowel /o(:)'/ [o(:)]. This is asymmetrical with respect to the corresponding 
non-back vowels (cf. Error! Reference source not found.), but the explanation for this lies in the 
relatively recent retraction of a historical mid vowel, *a(:), which has disrupted the vowel height — 
pseudoregister relationship in the Pacoh back vowel inventory (Gehrmann 2015, 2019). 

Figure 1 presents a graphical representation of mean F1 values for each token across the middle 
50% of vowel duration (i.e., 0.25 < t < 0.75 with respect to regularized time). Low pseudoregister 
vowels are indeed consistently closer in vowel quality than their high pseudoregister counterparts and 
Watson’s observation that /u(:)4/ is unexpectedly more open in vowel quality than /o(:)'/ is confirmed. 
This suggests that all of the pseudoregister vowel pairs would be differentiable one from the other based 
solely on vowel quality measures, even in the absence of additional, co-varying register cues (see F1- 
F2 charts in Figure 2). 


Figure 1: F1 measurements for vowels 


Long Short 


1000 - 


600 - ~ 4 


287 


Papers from SEALS 30 — Gehrmann 


Figure 2: F1-F2 charts 
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nt 
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e:t Oi 
ot 
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= 
— a = ur" at 
Ww 
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700 - o:4 on 
H 
800 - a 
at 
2000 1600 1200 2000 1600 1200 
F2 (Hz) 


3.2 Pitch and Voice Quality 

Fundamental frequency (FO) and spectral tilt (H1*-H2*), one of the frequent acoustic correlates of voice 
quality differences, do co-vary with pseudoregister here, though the effect is rather weak. Low 
pseudoregister is marked by higher FO and H1*-H2* measures on average and high pseudoregister is 
accompanied by lower FO and H1*-H2* measures on average. Figure 3 demonstrates this. The falling 
H1*-H2* contour indicates an increase in laryngeal tension toward the end of the vowel duration. This 
coincides with a decrease in FO, which indicates that glottal pulses are becoming elongated and more 
irregular. We would expect to see this if laryngealization were increasing towards the end of the rime, 
as is often the case in high pseudoregister words (more on this below), however the degree of 
laryngealization indicated here does not approach the strong creak achieved toward the right edge of 
high pseudoregister words in Ta’oi and Sedang, for example (Gehrmann 2019, Smith & Sidwell 2015). 
Note also that the H1*-H2* measures in Figure 3 are all negative in value. Consequently, breathy voice 
quality is not indicated in this data and we are only dealing with degrees of modal to tenser-than-modal 
voice quality here. 
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Figure 3: FO and H1*-H2* over regularized time for all vowels by vowel length 


Long Short Long Short 
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The trend evident in Figure 3 is not equally evident in all phonetic environments. While some factors, 
such as onset voicing, vowel height or vowel backness were not found to have a significant effect on 
this trend, coda manner of articulation does have a small effect. Figure 4 shows how the spectral tilt 
difference between the two pseudoregisters weakens or disappears in words with oral stop or voiceless 
fricative codas. Otherwise, in the high pseudoregister, spectral tilt measures are relatively low towards 
the right edge of words. 


Figure 4: FO and H1*-H2* over regularized time by vowel length and terminance type 
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Because the difference in FO between the two pseudoregisters is small, one might suspect that intrinsic 
FO is to blame for the higher FO measurements in the low pseudoregister. All things being equal, it is 
expected that close vowels will have higher FO than non-close vowels and, suspiciously, all phonetically 
close vowels in this language are in the low pseudoregister. Nevertheless, the same difference in FO 
between the two pseudoregisters is apparent for all monophthongs in Pacoh, as demonstrated in Figure 
5. This indicates that the intrinsic FO effect is being overridden here, in service of the phonological 
pseudoregister distinction, to which FO is a phonologized, extrinsic cue. 
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Figure 5: FO over regularized time by vowel length and vowel height 
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To summarize, it is clear that vowel quality differences are the primary and most reliable phonetic cue 
to the pseudoregister contrast, but there is also support for differences of pitch and voice quality playing 
a secondary role as redundant cues, especially in words ending with a sonorant or glottal(ized) coda. 
The low pseudoregister may be described as involving relatively close vowel quality and, optionally, 
relatively high, level pitch and relatively lax voice quality. The high pseudoregister, by contrast, is 
characterized by relatively open vowel quality and may optionally demonstrate relatively low-falling 
pitch coinciding with an increase in laryngeal tension towards the end of the vowel. 


4 Register and Pseudoregister 
The term register is used to describe various natural phenomena in linguistics, speech pathology and 
vocal pedagogy, but in Southeast Asia, the term is used to refer to a particular type of suprasegmental 
contrast. Even within the specific sub-domain of Southeast Asian phonology, however, register is not a 
well-defined concept. There is no established definition of the term, and different researchers have used 
the term differently. For some, phonological register may be used to refer to any suprasegmental 
contrast for which differential voice quality is considered the primary phonetic cue. Others are less 
inclusive, constraining register to only those voice quality-based contrasts which are cognate with the 
historical voicing of onset consonants. Some use the term register to refer to any tonal system in which 
differences of pitch and of voice quality work in tandem to mark tonal contrasts (i.e., a register-tone 
system) while others use register to refer to sets of tonemes which result from a phonemic tone split 
conditioned by historical onset voicing differences. 

In order to properly differentiate register from pseudoregister then, a working definition of register 
is needed first. I propose the following definition: 


Register: a binary, suprasegmental contrast upheld by a certain suite of naturally co-varying phonetic 
cues (or a subset thereof), which frequently arises to enhance and ultimately replace historical onset 
voicing contrast. 


This “suite of naturally co-varying cues” may be referred to collectively as the register bundle of cues. 
These include, most notably, voice quality, vowel quality and pitch, as summarized in Table 2. 
Table 2: The primary phonetic cues associated with register 


High Register Low Register 


Voice Quality Modal Breathy 
Vowel Quality |= More Open Closer 
Pitch Higher Lower 


I define register with overt reference to the historical conditioning environment in which it develops. 
Typologically similar contrasts which resemble register in terms of their phonetics and/or phonology 
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are thus excluded if they do not arise under conditioning from differences of historical onset phonation. 
In addition to having a different point of origin, these other, register-like contrasts exhibit certain 
evolutionary potentialities which, based on the language evidence available, do not appear to be 
available for conventional register contrasts. These crucial differences in point of origin and ultimate 
evolution are why I propose the term pseudoregister, as a means of differentiating this phenomenon 
from register proper. 


5 Pseudoregister formation and evolution 

What, then, is the origin of pseudoregister if historical onset voicing is not involved? In all of the 
pseudoregister languages described thus far, pseudoregister evolved out of historical vowel quality 
differences, with phonetically close vowels conditioning low pseudoregister and phonetically open 
vowels conditioning high pseudoregister.* This pattern is concordant with the well known patterns of 
vowel height-register interaction found in conventional register languages, whereby close vowels are 
stable in the low register and open vowels are stable in the high register, but high register close vowels 
and low register open vowels are unstable and typically become restructured over time in terms of vowel 
quality (Huffman 1985, Gehrmann 2015). This leaves all phonetically close vowels as low register 
vowels and all phonetically open vowels as high register vowels. The fundamental association of close 
vowel height with low (pseudo)register and open vowel height with high (pseudo)register is clear, even 
if it less clear why this should be so. 

A promising potential explanation casts tongue roots position and larynx height as the articulatory 
gestures which underlie both register and pseudoregister. This would explain some of the intriguing 
parallels between binary register contrasts in Southeast Asia and binary tongue root harmony contrasts 
found elsewhere in the world. We may well hypothesize that tongue-root harmony languages and 
register languages are both drawing on the same suite of naturally co-varying phonetic cues associated 
with the expansion or reduction of supraglottal cavity volume, as Gregerson (1976, 1984) suggests. 


Table 3: Examples of Pseudoregister formation in Rengao 


Bahnar Rengao Bahnar Rengao 

PB *e: PB *i: 

babe: goat babi:# ~— goat bri: woods bri} wild (forest) 
re: rattan jae rattan | jri: banyan tree jri:& banyan tree 
kane: rat kani:# rat Si: louse eit louse 

Pake: horn kus antlers | ti: hand ti! hand 

PB *o: PB *u: 

Po: bee Port wasp | tun carry to:nt carry 

"lon tree lo:n" wood | kun ladder go:nt stairs 

bo: casket bo:n# coffin | sun axe co:nt axe 

gon beat gong _go:n" gong | jun stand up jot sit up 

PB *a PB *a 

nam go nam" go ka’nam under ka’nam! under 
padam five padam! five hatep _— dig hole tanap’ — bury 

?akan woman kan# female | bat make adam bat" dam 

mat eye mat# eye Pat hold breath — at’ stop breathing 
jran house post ran" post glok drown glak* drown 

man night man night | katey hear tant hear 


By way of example, Table 3 demonstrates the formation of pseudoregister contrast out of historical 
vowel height contrasts in Rengao, a North Bahnaric language, with reference to the more conservative 


4 See examples from the Bahnaric (Smith 1972, Sidwell 2015), Katuic (Diffloth 1982; Sidwell 2005; Gehrmann 
2015, 2019) and Pearic (Sidwell 2019) branches of Austroasiatic. 
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Bahnar language (< Central Bahnaric), which retains the Proto-Bahnaric vowel contrasts in these 
examples.° 

A survey of the pseudoregister languages described thus far reveals two basic typologies: 
lax-marked pseudoregister and tense-marked pseudoregister (Gehrmann 2015). In lax-marked 
pseudoregister languages, the low pseudoregister is the more marked of the two in terms of voice 
quality, being realized with breathy voice. In tense-marked pseudoregister languages, it is the high 
pseudoregister that is more marked, being characterized by creaky voice / laryngealization. The North 
Bahnaric languages testify to the fact that historically cognate pseudoregister contrasts may be realized 
synchronically as either lax-marked or tense-marked. The Sedang language has a tense-marked 
pseudoregister contrast but cognate contrasts in other North Bahnaric languages such as the previously 
mentioned Rengao are lax-marked. This suggests that unlike conventional register, the phonetic voice 
quality cues associated with pseudoregister exist along a continuum of laryngeal tension, as illustrated 
in Figure 8. 


Figure 6: Lax-marked vs. tense-marked pseudoregister on a continuum of laryngeal tension 


Lax-Marked Pseudoregister 


/high/ Now/ 


[modal] [breathy] 


Tense-Marked Pseudoregister 


That is not to say that pseudoregister languages move back and forth along this laryngeal tension 
continuum at random, however. On the contrary, a historical progression from more conservative 
lax-marked pseudoregister typology to a more innovative tense-marked typology is evident. Firstly, 
lax-marked pseudoregister languages retain conservative differences in vowel height between 
pseudoregister vowel pairs as a phonetic cue to the pseudoregister contrast, even as new cues of pitch 
and/or voice quality come in to enhance the original vowel quality contrast. This is not true of 
documented tense-marked pseudoregister languages, in which historical vowel quality differences have 
been erased by full phonetic merger of vowel quality while the job of cueing the pseudoregiser contrast 
has been assumed by differential pitch and/or voice quality. Secondly, all documented tense-marked 
pseudoregister languages have in common the loss of historical onset voicing contrasts, while onset 
voicing contrasts have been retained in all documented lax-marked pseudoregister languages. This is 
highly suggestive that tense-marked pseudoregister languages are formerly lax-marked pseudoregister 
languages that have undergone a general tensing of the pseudoregister contrast in conjunction with the 
devoicing of onsets. 

The catalyst for the general tensing of pseudoregister then is actually registrogenesis. 
Consequently, register and pseudoregister may in fact occur together in a language, intertwining into a 
complex, tone-like, register-pseudoregister hybrid as has been well documented in Chong (Ferlus 1979, 
2011; Headley 1985; Sidwell 2019). The possibility of such interactions between register and 
pseudoregister is another argument for disambiguating the two phenomena terminologically, as I 
propose here. 

When a pseudoregister language undergoes registrogenesis, the devoicing of voiced stop onsets 
begins to impart low register cues. As a result, four combinations of register and pseudoregister are 
produced as illustrated in Table 4. Focusing now on voice quality, a problem arises when the two “low” 
categories combine. Low pseudoregister has no room to become laxer, as it is already associated with 
breathy voice quality on the lax end of the laryngeal tension continuum (see Figure 8). However, high 


5 Bahnar data from Banker et al. (1979), Rengao data from Gregerson & Gregerson (1977), pBahnaric 


reconstructions from Sidwell (2011). 
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pseudoregister, being associated with modal phonation, does have room to become tenser and move 
into the creaky/laryngealized end of the spectrum. This is exactly what we find in Chong and other 
tense-marked register languages, where the historical high pseudoregister shifts tenser to 
creaky/laryngealized and the historical low pseudoregister shifts tenser to modal voice. As a result, 
breathy voice is removed as a cue to the low pseudoregister category and becomes associated instead 
with the nascent low register category. 


Table 4: General tensing and the register-pseudoregister interaction 


Register 
High Low 
High [creaky] [breathy-creaky] 
Pseudoregister 
Low [modal] [breathy] 


The pattern in Table 4 describes exactly the pattern of development seen in the Chong register- 
pseudoregister complex. However, in the other documented tense-marked pseudoregister languages, we 
find a simplification of this scheme. In Ta’oiq and Sedang, the pseudoregister contrast remains intact 
and has undergone the same general tensing seen in Chong, but there is no modern reflex of 
conventional register. Historically voiced stop onsets have been devoiced and fully merged into the 
historically voiceless stops, but, there is no register contrast crosscutting and splitting the pseudoregister 
contrast in these languages. Register must have at least begun to develop in these languages in the past 
in order to catalyze the general tensing of pseudoregister, but register either failed to trigger a phonemic 
split in pseudoregister in these languages as it did in Chong or, if it did do so in the past, the expected 
4-way contrast has since simplified to a 2-way contrast. Table 5 illustrates this state of affairs in Ta’oiq 
and Sedang. 


Table 5: General tensing without phonemic register (as in Ta’oig and Sedang) 


Register 
Neutralized 
High [creaky] 
Pseudoregister 
Low [modal] 


6 What Kind of Pseudoregister Language is Pacoh? 

Pacoh pseudoregister is unique. It does not fit comfortably in either the lax-marked or the tense-marked 
pseudoregister typology. Like tense-marked pseudoregister, Pacoh has undergone onset stop devoicing 
and a general tensing of the pseudoregister contrast; however, unlike tense-marked pseudoregister, 
vowel height remains a reliable phonetic cue for the pseudoregister vowel pairs in Pacoh. The phonetic 
merger of vowel quality among pseudoregister pairs, which is characteristic of other documented tense- 
marked pseudoregister languages, has not occurred in Pacoh. In fact, vowel quality differences are 
unexpectedly the most reliable cue to the pseudoregister contrast here, while the obvious difference of 
voice quality between high and low pseudoregisters which is documented in Ta’oiq, Sedang and Chong 
(i.e., laryngealization or lack thereof, respectively) is not found in Pacoh. The voice quality difference 
between the two Pacoh pseudoregisters is detectable, as demonstrated in Section 3, but much more 
subtle. Pitch differences are likewise subtle. 
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Based on the above discussion, the expected pattern of pseudoregister formation and evolution may be 
summarized as shown in Table 6. 


Table 6: A model of pseudoregister formation and evolution 


Stage *D- Voice Quality Vowel Height Examples 
1: Lax-Marked Voiced | Modal: Breathy | More Open: Closer N. Bahnaric® 
2: Transitional | Devoiced | (Tense) : Modal | More Open : Closer Pacoh 


3: Tense-Marked | Devoiced | Creaky : Modal | Phonetically Merged | Ta'oig, Sedang, Chong 


Pacoh would appear to offer us a rare glimpse at a language intermediate between the lax-marked and 
tense-marked stages of pseudoregister development. However, based on the results of this admittedly 
preliminary acoustic study of Pacoh, it seems unlikely that this language is actually evolving in the 
direction of a tense-marked pseudoregister language. Historical *D onsets have clearly devoiced, and 
the voice quality cues to the pseudoregister contrast have clearly shifted for the tenser when compared 
with the lax-marked pseudoregister languages of North Bahnaric. However, while this shift has moved 
the historical low pseudoregister away from breathy voicing, the historical high pseudoregister has not 
yet become associated with creaky voice as we would expect based on the known tense-marked 
pseudoregister languages. Rather than a stronger emphasis on voice quality and pitch cues and a 
diminished role for vowel quality, we find instead the marginalization of voice quality and pitch cues 
and a doubling down on vowel height as the primary and only reliable cue to Pacoh pseudoregister. 
This hints at the possibility that pseudoregister is, in fact, fading away in Pacoh as a contrastive property 
of the language’s phonology, with a reversion to contrasts of vowel quality. In other words, at an earlier 
phase, Pacoh entered into the pseudoregister formation process and saw historical vowel height 
contrasts re-interpreted as pseudoregister contrasts marked by a complex of cues from the register 
bundle which came in alongside differences of vowel height (1.e., Stage 1). However, the pseudoregister 
vowel pairs subsequently failed to coalesce in terms of vowel quality and remained quite strongly 
differentiated in terms of vowel quality, even as the language began the transitional phase with onset 
devoicing, registrogenesis and general tensing (i.e., Stage 2). As a result, the language is exiting the 
pseudoregister life cycle out of a side door, so to speak, abandoning pseudoregister and returning instead 
to purely segmental contrasts of vowel quality. Strengthening the argument that pseudoregister loss is 
in fact in progress in the High Pacoh dialect under discussion here is the fact that two other dialects of 
Pacoh, Cado (Gehrmann 2015, 2019) and Bahi (Watson 1996), have already experienced pseudoregister 
loss.’ 


7 Summary and Outlook 

In this paper, it has been demonstrated that the pseudoregister contrast of Pacoh splits Pacoh vocalism 
into a high pseudoregister cued by more open vowel quality, slightly lower pitch and slightly tenser 
voice quality and a low pseudoregister cued by closer vowel quality, slightly higher pitch and slightly 
laxer voice quality. After establishing a typological profile for pseudoregister languages based on 
documented instances of the phenomenon, it was shown that Pacoh does not fit neatly into either the 
more conservative lax-marked pseudoregister typology or the more innovative tense-marked typology. 
Instead, Pacoh appears to be an abortive pseudoregister language, which is abandoning the contrast and 
reverting to a language without register or pseudoregister, as the connection between pseudoregister 
vowel pairs breaks down and contrasts of vowel quality take their place. 


6 Including Rengao, Hre, Kayong, Jeh & Halang (cf. Smith 1972, Sidwell 2015) 

It remains possible that Cado and Bahi never underwent pseudoregister formation in the first place. However, 
the devoicing of *voiced stops without register reflexes and certain developments in their vowel inventories 
suggest that pseudoregister formation was common to an earlier phase of Pacoh. 
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This has been a mere introduction to the study of Pacoh pseudoregister, based on the speech of one 
speaker recorded half a century ago. Further documentation of the Pacoh language is a research priority, 
as there may be more conservative dialects which preserve more faithfully the old Pacoh pseudoregister 
contrast or, perhaps, a different iteration on it with a separate evolutionary history. This would inform 
the model of pseudoregister evolution put forward here. 

More generally, further documentation of the pseudoregister phenomenon is sorely needed, 
especially when it comes to early-stage, lax-marked pseudoregister. Researchers who have the 
opportunity to gather data on the few remaining pre-registral / pre-tonal Austroasiatic languages should 
prioritize quantitative investigations into differences of voice quality and pitch among vowels of 
different vowel height series in these languages, in addition to looking for differences in cues from the 
register bundle after onsets of different laryngeal settings. Acoustic and perceptual studies on such 
languages will be necessary to make progress in understanding how vowel height contrasts may be 
transformed into pseudoregister contrasts. 
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Abstract 

In this article, we detail the prosodic typology of disyllables in Santali spoken in Sonitpur 
district, Assam, India. Previous investigations present a chaotic picture of contradictory 
analyses. For Santali, Ghosh (2008:30) states “(s)tress is always on the second syllable of 
the word regardless of whether it is an open or a closed syllable”. Neukom (2001:8), on 
the other hand, claims that in disyllabic stems, “(s)tress falls on the first syllable; however, 
if the first syllable is light and the second heavy (iambic structure), stress falls on the 
second syllable”, i.e., a Quantity Sensitive system. Bodding (1922) detailed a partly 
morpholexical system of prominence assignment. To test this, we recorded data from male 
and female speakers of various ages in a noise-free environment in the field. Forms were 
recorded in isolation, in a quasi-focal frame “I___ said”, and in an out of focus frame “I 
____ LOUDLY said, not SOFTLY” to control for focal intonation effects and those of 
phrasal or utterance intonation. Based on subsequent instrumental analysis, we suggest that 
Assam Santali always shows prominence on the second syllable of disyllables, cued by 
intensity, f0 and duration—a pattern like that attested in Assam and Odisha lects of Sora 
(Horo and Sarmah 2015, Horo 2017, Horo et al. 2020), contra Donegan and Stampe 
(2004)—an iambic pattern and not quantity sensitive, at least in disyllables. 


Keywords: Santali, Sora, Munda, Prosody, Prominence 
ISO 639-3 codes: sat 


1 Introduction 

In this article, we detail a preliminary study on the prosodic typology of the Kherwarian Munda 
languages, a group of Austroasiatic languages spoken in eastern and northeastern India, focusing here 
on one lect of Santali spoken in Sonitpur district, Assam, India. This variety has not been described 
previously, nor have any Kherwarian lects been studied experimentally using instrumental phonetic 
methods before. 

The major group of Santali speakers resides to the southwest of this region in West Bengal, 
Jharkhand, and Northern Odisha. Some of these varieties of the language have been subjected to 
previous analysis in some domains, but not to date in terms of instrumental phonetic analysis. Thus, 
only impressionistic statements have been made to date in the literature about the prosodic or 
intonational system of Santali. Indeed, the previous investigations present a chaotic picture of 
contradictory analyses. 

For Santali, Ghosh (2008:30) states Santali has fixed second position stress while Neukom 
(2001:8) claims a Quantity Sensitive system where a heavy syllable following a light one takes stress. 
Otherwise, it is the initial syllable. Bodding (1922) for his part detailed a partly morpholexical system 
of prominence assignment. Thus, subjecting this analysis to the rigors of instrumental verification is an 
ongoing process. In this preliminary study, we limit ourselves to uninflected lexical items that are in 
their underlying structure disyllabic, with a few that surface as trisyllabic with a weak medial syllable 
resolving a word medial onset cluster. Moreover, forms were recorded in isolation, in a quasi-focal 
frame “IJ___ said”, and in an out of focus frame “I = =| LOUDLY said, not SOFTLY” to control for 
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focal intonation effects and those of phrasal or utterance intonation. Based on the data acquired in the 
above formats and subsequent phonetic analysis, prominence in Santali disyllables is explored and the 
findings, supported by acoustic evidence, are reported in this paper. 


2 Santali of Assam 

According to the Census of India (Registrar General of India 2011) there are 7,368,192 Santali speakers 
in India, of which 213,139 speakers are reported in Assam. The Santali variety of Assam has not been 
analyzed previously. The current study presents a preliminary finding that is based on speech data 
recorded from Santali speakers living in two villages of Assam’s Sonitpur district, namely, Erasuti and 
Borbil (see Figure 1). There are approximately eight hundred Santali individuals in Erasuti and one 
hundred and fifty Santali individuals in Borbil, and both villages have a major concentration of Santali 
speakers in Sonitpur District of Assam. Also, the inhabitants claim to have lived in those villages for at 
least four generations until now. Moreover, unlike most ethnic Munda inhabitants of Assam, who are 
reported to have migrated to Assam as indentured tea laborers from parts of Eastern India in the 
nineteenth century (Tea Districts Labour Association-India 1924), the Santali community recorded in 
this work claim that they are native inhabitants of the land, not migrant laborers. 


Figure 1: Map of Assam highlighting locations of Santali speech data collection 
Santali Villages in Sonitpur District of Assam, India 


Oe Si 
Sb ie rps: 
ee ES 
aa 


100 200 km 


For the purposes of this study, a total of six native Santali speakers (three male and three female) living 
in the two villages were interviewed and recorded to acquire the data. The average age of the 
participants is thirty-one years with a standard deviation of eleven years. Each participant is a 
multilingual speaker, and besides Santali they also speak Sadri, Assamese, and Hindi as their second, 
third and fourth languages. Among the six participants, only one male and one female have completed 
their high school education, whereas the others dropped out of their formal education either during high 
school or even earlier. 


2.1 Data collection 

To collect Santali speech samples, a dataset was created from a list of basic Santali vocabulary 
wherefrom, the text data, including fifty-one Santali disyllabic words having (C)V(C).(C)CV(C)! 
syllable structures, were generated. All words in the dataset are non-derived nouns, including words for 


' Four targeted disyllabic words that have onset clusters in the second syllable, [¢/epre], [sikri], [k"apyi] and 


[pand3ra], were interchangeably produced as trisyllabic words with the insertion of an epenthetic vowel [9] 
breaking the obstruent and rhotic consonant clusters present in the words. Such variants are not included in 
the analysis. 
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body parts, animal names and words for natural objects. Table 1 presents a subset of the text data used 
in this study to generate the speech data. 


Table 1: Subset of Santali disyllabic data used in the study 


Santali | English | Syllable Structure 
ado ‘urine’ | V.CV 

ipil star’ V.CVC 

supu ‘arm’ CV.CV 

lutur ‘ear’ CV.CVC 

jinda_| ‘night’ | CVC.CV 

katkom | ‘crab’ CVC.CVC 


DIN) BIW | 


Subsequently, the speech data is derived from eliciting and recording the text data once in isolation and 
once in each of the two sentence frames shown in (1) and (2). 


(1) in ee men-kedin 
1SG aon say-PST.TR/ACT. 1SUBJ 
‘I said : 

(2) in oe gula=te men-kediy lahe=te=do bay 
1sG __——— loud=ADV say-PST.TR/ACT. 1SUBJ soft=ADV=CONIJ NEG 
‘Isaid _ loudly not softly’ 


In both (1) and (2), the blank space is replaced by a target word from the text data. The phrasal position 
in (1) is intended to capture phrasal prominence in the target words and control for any speech 
perturbation that may be caused during the production of words in isolation, and the target position in 
(2) is intended to record the words in an out of focus or an unaccented intonational context to control 
for possible information structure effects. The fifty-one unique lexical items recorded thrice from six 
individuals produced a total sample size of 918 Santali disyllables, of which 46.13% have the CV.CV 
syllable structure with the vowels [a, i, e, 0, ul’ as the syllable nucleus of both the first and second 
syllables. The data were recorded in a noise-free environment in the field using a head-worn 
unidirectional Shure mic connected by XLR cable to a Tascam linear PCM recorder, and the digital 
data are stored at a sampling frequency of 44.1 kHz and a bit depth of 32 bits in .WAV format. 


2.2 Data analysis 

The Santali speech data collected from the field were subjected to phonetic analysis by means of 
acoustic phonetic methods, for which purpose the data were manually annotated for word boundary and 
phoneme boundary in Praat (Boersma and Weenink 2020) using the spectral and temporal cues of 
speech sounds. Thereby, vowel sounds are annotated between the beginning and end of glottalic pulses, 
and sonorant sounds are annotated in low amplitude regions. In the case of the obstruents, they are 
annotated between the release of the oral closure and the beginning of the glottalic pulses in onset 
position and between the end of glottalic pulses and the point of oral closure in coda position (see Figure 
2). 


The data set also includes three nasal vowels [4, i, €] with [4] appearing in the first syllable, [i] appearing in 
the second syllable and [€] appearing in both first and second syllables; a diphthong [ai] appearing in the 
second syllable and a lax front vowel [e] appearing in the second syllable; they are for now not treated as 
unique syllable nuclei in the general analysis. However, for examining the interaction between syllable 
prominence and vowel types, the nasal vowels, the diphthong(s) and the lax front vowel are not included. Also, 
Ghosh (2008) includes schwa in his Santali vowel phoneme inventory, but Bodding (1922) considers it to be 
an allophone of [a] in words with high vowels. Until we have done instrumental analysis on the vowel system 
of various Santali lects, we reserve judgment on this issue too. 
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Figure 2: Spectrographic illustration’ of phonetic annotation of a Santali disyllable as produced by a 
male Santali speaker living in Sonitpur district of Assam 


. 0.8181 
Time (s) 


2.2.1 Analyzing prominence in Santali disyllables 

The analysis of prominence in Santali disyllables is based on three acoustic cues, namely, vowel 
duration, vowel intensity and fundamental frequency (Fry 1955; 1958). While vowel duration is 
calculated from the absolute length of the vowel nuclei in first and second syllables, vowel intensity is 
measured from the mean amplitude of the entire length of the vowel nuclei. Likewise, fundamental 
frequency is estimated from the mean of the entire vowel segment. Additionally, all values for the three 
acoustic parameters are normalized for speaker variability using the z-score normalization method (z=x- 
u/o)*. The normalized data is then analyzed and visually represented in R Version 3.5.3 (R Core Team, 
2019) using the ggplot2 package (Wickham et al. 2016) through its built-in functions of geom_boxplot 
and geom_density where the normalized values of vowel duration, vowel intensity and average 
fundamental frequency are treated as dynamic variables. Syllable positions and the contexts of the 
utterance (isolation, phrasal frame, unaccented frame) are treated as factor variables. Further, the same 
normalized data is tested with a one-way analysis of variance (ANOVA) using the aov function in R 
Version 3.5.3 (R Core Team 2019). For this purpose, normalized values of vowel duration, vowel 
intensity and average fundamental frequency are treated as dependent variables and syllable position 
(first versus second) and context of utterance (isolation, phrasal and unaccented) are treated as 
independent variables. 


2.2.2 Analyzing segmental effect on prominence in Santali disyllables 

This study also includes an investigation of segmental effects on syllable prominence in Santali 
disyllables. First, to examine the interaction between syllable prominence and vowel types, a subset of 
data, having only the five frequently occurring vowels [a, i, e, 0, u] in first and second syllables of 
Santali disyllables, is used. In this analysis, vowel duration, vowel intensity and average fundamental 
frequency of the five vowels are measured separately in the two syllable positions. The same data is 
then visually represented using the built-in functions of geom_boxplot and facet_grid in the ggplot2 
package (Wickham et al. 2016) of R Version 3.5.3 (R Core Team, 2019), in which the normalized values 


3 This sound file can be heard and accessed in the Santali Living Dictionary at 


https://livingdictionaries.app/santali/entries/fOnG6dDN9GIg YOWPSs6)C . 
z= Normalized value; x = Individually extracted values; 4 = Mean of x; o = Standard Deviation of x. 
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of vowel duration, vowel intensity and average fundamental frequency are treated as dynamic variables 
and syllable positions, vowel types and contexts of utterances are treated as factor variables. 

Second, the effect of different consonantal environments on syllable prominence was examined. 
To do the analysis, a subset of data, including only the frequently occurring CV.CV syllable structure, 
that have onset consonants [h, k, m, r, tf] in both syllables, was used. Thereby, vowel duration, vowel 
intensity and average fundamental frequency in first and second syllables that have the five onset 
consonants were examined and compared with each other. This analysis is also represented using the 
same visualization method described above for examining the interaction between syllable prominence 
and vowel types except that the vowel type factor is changed to onset consonant type factor for this 
analysis. 


3 Findings: Prominence in Santali disyllables 


3.1 Vowel duration 

Vowel duration is acommon acoustic cue to determine prominence in languages whereby longer vowel 
duration is considered a robust indicator of prominence in languages of the world (Gordon and Roettger 
2017). In the case of Santali (Ghosh 2008:23), vowel length is not reported to be phonemically distinct, 
and the data presented here exhibits only the phonetic length of vowel segments as produced in syllable 
nuclei of disyllabic words. By examining the speech data as produced by the Santali speakers recorded 
in this study, it was observed that vowel duration is generally longer in the second syllable as compared 
to vowel duration in the first syllable in Santali disyllables. Figure 3 demonstrates the vowel duration 
distinction in first and second syllables in the targeted disyllables as recorded in the three utterance 
contexts. 

From Figure 3, it is evident that in Santali disyllables, average vowel duration in the second syllable 
is always longer than in the first syllable. Additionally, in Figure 4, the density plots reveal that the 
distribution of vowel duration in first and second syllables is distinct in all three utterance contexts. 
Also, a distinct skewing was observed in the data, indicating that longer vowel durations occur in second 
syllables, whereas shorter vowel durations occur in first syllables. 


Figure 3: Average Vowel Duration in First (1) and Second (2) syllables in Santali Disyllables. 


Syllable Position 3 1 = 2 


Np 
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Figure 4: Density distribution of vowel duration differences in first (1) and second (2) syllables in 
Santali disyllables. 
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In this regard, it is notable that the distinction in vowel duration is greater when the disyllabic words 
are produced in isolation than when they are produced in the phrasal and in the unaccented intonational 
contexts.° Moreover, the difference in vowel duration in first and second syllables of Santali disyllables 
is found to be statistically significant in isolation [F(1,642) = 796.4, p < 0.001], in phrasal contexts 
[F(1,629) = 101.3, p < 0.001], and in unaccented intonational contexts [F(1,622) = 248.9, p < 0.001]. 


3.2 Vowel intensity 

Vowel intensity refers to the acoustic energy in a vowel segment which is normally greater in prominent 
syllables than in non-prominent syllables. Unlike vowel duration, vowel intensity is a less robust cue 
for diagnosing prominence in languages, yet there is evidence that prominent syllables are distinct from 
non-prominent syllables with respect to their mean intensities (Gordon and Roettger 2017). 
Accordingly, in the present data, it is observed that both the averages as well as the overall distribution 
of average vowel intensity in first and second syllables of Santali disyllables are distinct from each other 
(see Figure 5 and 6). 

From Figure 5, it is revealed that in Santali disyllables the average vowel intensity in second 
syllables is greater than the average vowel intensity in first syllables. Also, this is found to be true for 
Santali disyllables produced in all three utterance contexts. Likewise, the density plots in Figure 6 reveal 
that the distribution of average vowel intensity in first and second syllables is distinct wherein a skewing 
towards higher vowel intensity is observed in second syllables but a skewing towards lower vowel 
intensity is observed in first syllables. Also, the average vowel intensity difference in each of the three 
utterance contexts, namely, isolation [F(1,642) = 30.62, p < 0.001]; phrasal [F(1,629) = 74.8, p < 0.001] 
and unaccented intonational contexts [F(1,622) = 33.72, p < 0.001] are found to be statistically 
significant. 


5 This may indicate that vowel duration plays a role in demarcating the end of utterances, but this remains to be 


examined more systematically. 
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Figure 5: Average vowel intensity in first (1) and second (2) syllables in Santali disyllables 
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Figure 6: Density distribution of vowel intensity differences in first (1) and second (2) syllables in 
Santali disyllables 
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3.3 Fundamental frequency 

Fundamental frequency represents pitch variation in speech sounds, and a systematic variation in pitch 
across syllables is known to be an indicator of syllable prominence in various languages (Gordon and 
Roettger 2017). Generally, prominence is associated with higher pitch which is expressed by greater 
fundamental frequency in the prominent syllable as opposed to lower fundamental frequency realized 
in the non-prominent syllable. The Santali data examined in this work reveals a similar pattern, but an 
exception is also observed in the analysis. Figure 7 shows the average fundamental frequency 
differences in first and second syllables of Santali disyllables. 


Figure 7: Average fundamental frequency in first (1) and second (2) syllables in Santali disyllables 
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Figure 8: Density distribution of Fundamental Frequency in First (1) and Second (2) syllables in 
Santali Disyllables 
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Figure 7 reveals that, in Santali disyllables, while the average fundamental frequency is higher in the 
second syllable than in the first syllable in the phrasal and unaccented intonational contexts, variation 
is absent when the disyllabic words are uttered in isolation. The same pattern is observed in the density 
plots presented in Figure 8, which shows that the distribution of average fundamental frequency in first 
and second syllables is not distinct in words that are spoken in isolation, whereas the same distributions 
appear to be distinct in words that are spoken in both the phrasal and unaccented intonational contexts. 
In this regard, the findings are also confirmed through statistical analysis whereby the average 
fundamental frequency difference in first and second syllables of Santali disyllables is found to be 
distinct with statistical significance when spoken in phrasal [F(1,629) = 229.8, p < 0.001] and 
unaccented [F(1,622) = 183.1, p < 0.001] intonational contexts but not when spoken in the isolation 
[F(1,642) = 1.354, p = 0.245] context. Thus, the analysis here suggests that although prominence, 
realized by higher pitch, in second syllable is present in Santali disyllables, the distinction is likely to 
be neutralized in words that are produced in isolation. However, this observation requires further 
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investigation® with substantial data evidence which could not be achieved in this preliminary study of 
prominence in Santali disyllables. 


4 Findings: Segmental effects on syllable prominence 

To explore the interaction between syllable prominence and segmental types and to see if the 
prominence of the second syllable over the first syllable in Santali disyllables is consistently maintained 
in different vowel types and in different consonantal environments, the three acoustic cues of 
prominence, namely, vowel duration, vowel intensity and fundamental frequency are further examined 
in this study using two separate subsets of data. Firstly, the five vowels in Santali [a, e, 1, 0, u], separated 
by their syllable positions, were examined to see if all vowel types render similar patterns of syllable 
prominence or not. Secondly, a subset of data containing only the CV.CV syllable structure and having 
the onset consonants [h, k, m, r, tf] in both syllables was examined to see if changes in the consonantal 
environment also effects syllable prominence or not. 


4.1 Segmental effects on duration 

With respect to vowel duration in first and second syllables having different vowel nuclei, it is observed 
that all vowel types have longer vowel duration only in the second syllable in all three contexts (see 
Figure 9). This implies that in Santali disyllables, the five vowels [a, e, 1, 0, u] are phonetically longer 
only when they occur in the second syllable but are relatively shorter when they occur in the first 
syllable, and that prominence of the second syllable manifested by longer vowel duration is not affected 
by changes in vowel types in the syllable nuclei. Likewise, by examining vowel duration in the 
environment of the five Santali consonants [h, k, m, r, tf], it is observed that in Santali disyllables that 
have any of the five consonants in the onset position, vowel duration is longer in the second syllable 
only, and not in the first syllable (see Figure 10). Also, the pattern is observed to be consistent in all 
three utterance contexts included in this study. This implies that in Santali disyllables, different onset 
consonants do not have an impact on syllable prominence cued by vowel duration. 


6 A further investigation of pitch variation in first and second syllables can be achieved by extracting pitch 


values (fundamental frequencies) at different pitch timings including initial, medial, and final. This can show 
potential pitch changes that may be undisclosed in the average pitch distinction in the two syllable positions 
in the isolation context. Also, maximum, and minimum pitch values can be extracted, which can be utilized 
for examining potential separation of pitch ranges in the two syllable positions when the disyllables are 
produced in isolation. 
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Figure 9: Average vowel duration in first (1) and second (2) syllables in Santali disyllables having 
different vowel nucleus 
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Figure 10: Average vowel duration in First (1) and Second (2) syllables in Santali disyllables with 
CV.CV syllable structure having different consonantal environments 
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Thus, based on the data presented in Figures 9 and 10, it is confirmed that vowel duration is a robust 
cue for identifying prominence in Santali disyllables whereby neither the vowel types nor the onset 
consonants in CV.CV syllable structures affect the manifestation of prominence in the second syllable. 


4.2 Segmental effects on intensity 

The general examination of vowel intensity in first and second syllables of Santali disyllables revealed 
that vowel intensity is normally higher in the second syllable. In the micro analysis, while the tendency 
to produce higher intensity in the second syllable is found consistently in the mid vowels [e, o] and the 
low vowel [a], in the case of the two high vowels [i] and [u], the vowel intensity difference between the 
two syllables is observed to be neutralized and even reversed for the high front vowel [i] in the 
unaccented intonation context (see Figure 11). This indicates that the two high vowels [i] and [u] in 
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Santali disyllables do not have distinct vowel intensities between the two syllables. Also, while the 
distinction is observed to be equally neutralized in all three utterance contexts for the back high vowel 
[u], in case of the front high vowel [1], the difference is neutralized in isolation and in the phrasal context 
but is reversed in the unaccented intonational context. Thus, from analysing the interaction between 
vowel types and vowel intensity in Santali disyllables, it is revealed that prominence of second syllable 
cued by higher vowel intensity is maintained in disyllables that have either the mid vowel or a low 
vowel in the syllable nuclei, whereas the pattern is likely to be neutralized or reversed if the disyllables 
have only the high vowels in their syllable nuclei. 


Figure 11: Average vowel intensity in First (1) and Second (2) syllables in Santali disyllables having 
different vowel nucleus 
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Figure 12: Average vowel intensity in First (1) and Second (2) syllables in Santali disyllables with 
CV.CV syllable structure having different consonantal environments 
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In addition to the variations in vowel intensity arising from vowel types, an examination of the 
interaction between vowel intensity and different onset consonants reveals that vowel intensity 
differences in first and second syllables may be neutralized or reversed even in the environment of 
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certain consonant types in the onset position. Of the five onset consonants examined in this study, the 
analysis here reveals that the general tendency to produce higher vowel intensity in the second syllable 
has only been consistently maintained in all three contexts of utterances in the environment of the 
bilabial nasal consonant [m] (See Figure 12). On the other hand, vowel intensity distinctions between 
first and second syllables are observed to be neutralized in the environment of the voiceless velar 
consonant [k] and alveolar trill consonant [r] in the isolation and unaccented intonational contexts. The 
difference is seen to be reversed in the environment of the voiceless palatal affricate consonant [tf] in 
the isolation context only. These observations imply that like vowel types, onset consonant types in the 
CV.CV syllable structure also have an impact on the rendering of vowel intensity in Santali disyllables, 
whereby consonants such as [m] and [h] appear to have no or minimal impact. In contrast, consonants 
such as [k] and [r] may neutralize the intensity distinction, and consonants such as [t{] may even reverse 
the vowel intensity distinction in Santali disyllables. 

Thus, from analysing the interaction between vowel intensity and vowel types and onset consonant 
types it is revealed that, unlike vowel duration, vowel intensity is a less robust cue for identifying 
syllable prominence in Santali disyllables. Specifically, at the micro level, it is found that there are 
certain segmental exceptions that probably suppress the general tendency to produce higher intensity 
vowels in the second syllable of Santali disyllables. 


4.3 Segmental effects on fundamental frequency 

Fundamental frequency difference in Santali disyllables has been shown to be sensitive to various 
utterance contexts whereby higher fundamental frequency in the second syllable is exhibited only in the 
phrasal and unaccented intonational contexts but not in the isolation context of utterance. Significantly, 
the same pattern of fundamental frequency distinction is observed when the five Santali vowels [a, e, i, 
0, u] are analysed separately with respect to their syllable positions in the disyllables. Figure 13 presents 
the vowel-wise fundamental frequency distinction in first and second syllables of Santali disyllables in 
the three contexts of utterances included in this study. From Figure 13 it is evident that all five Santali 
vowels have higher fundamental frequency in the second syllable when they are produced in the phrasal 
and unaccented intonational contexts, but as an exception not in the isolation context. This implies that 
differences in vowel types do not impact the second syllable prominence depicted by higher 
fundamental frequency in Santali disyllables. 

Similarly, by examining the effect of different onset consonants on the fundamental frequency of 
the vowel nuclei of Santali disyllables, it is observed that besides the lack of fundamental frequency 
distinction in first and second syllables in the isolation context of utterance there is only a minimal 
impact of onset consonant types even in the phrasal and unaccented intonational contexts of utterances. 
Figure 14 presents the fundamental frequency difference in first and second syllables of Santali 
disyllables that are grouped according to their onset consonants [h, k, m, r, tf] and the contexts of 
utterances that are included in this study. 
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Figure 13: Average fundamental frequency in first (1) and second (2) syllables in Santali disyllables 
having different vowel nucleus 
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Figure 14: Average fundamental frequency in first (1) and second (2) syllables in Santali disyllables 
with CV.CV syllable structure having different consonantal environments 
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From Figure 14, it is observed that while a fundamental frequency difference between first and second 
syllables of Santali disyllables may remain neutralized in different consonantal environments when the 
target word is said in isolation for words beginning in [k, m, r], in the very same context, the difference 
may even be reversed in the environment of the glottal fricative consonant [h] and voiceless affricate 
consonant [tf]. Also, the same consonantal environments appear to have resulted in neutralizing or 
minimizing the fundamental frequency difference between first and second syllables of Santali 
disyllables even when they are produced in the phrasal and unaccented intonational contexts of 
utterances. 

Thus, the analysis of vowel-wise fundamental frequency differences in first and second 
syllables of Santali disyllables revealed that vowel types do not impact syllable prominence in the 
language. Additionally, the exceptional case of neutralizing fundamental frequency differences in first 
and second syllables in the isolation context of utterance is also confirmed from the examination of the 
interaction of vowel types and fundamental frequency in Santali disyllables. However, an analysis of 
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different onset consonants indicates that two consonant types, namely, fricatives and affricates may 
either minimize the fundamental frequency difference in first and second syllables or even reverse the 
difference when Santali disyllables with CV.CV syllable structures bearing the two consonants in the 
onset position are recorded in isolation only. 


5 Comparison with Sora 

Despite claims in the literature to the contrary (Donegan 1993, Donegan and Stampe 1983, 2004), 
instrumental analyses show that Sora clearly has second syllable prominence in disyllabic forms (Horo 
and Sarmah 2015, Horo 2017, Horo, Sarmah and Anderson 2020). Phonetic data representing Sora 
speech varieties of four geographical locations in Assam, namely, Singrijhan, Sessa, Lamabari and 
Koilamari and one geographical location in Odisha, namely, Raiguda, provide evidence that vowels in 
the second syllable are longer, louder and pitched higher than the vowels in the first syllable of 
disyllables. Accordingly, the acoustic cues of prominence reveal that vowel duration, vowel intensity 
and fundamental frequency (f0) are generally higher in second syllables only; see Figures 15 to 17. 


Figure 15: Average vowel duration in first and second syllable of Sora disyllables (Horo 2017) 
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Figure 16: Average fO in first and second syllable of Sora disyllables. (Horo 2017) 
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Figure 17: Average vowel intensity in first and second syllable of Sora disyllables. (Horo 2017) 
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Likewise, other Munda languages have been claimed to show similar patterning as well (e.g., Remo 
(Bhattacharya 1968) or West Bengal Santali (Ghosh 2008)), but no instrumental data has been offered 
in support of this, even while we believe those published observations to be accurate. However, Assam 
Santali, based on preliminary phonetics analysis and supported by acoustic evidence, shows that second 
syllable prominence is the pattern attested in disyllabic words. 

On the other hand, Indo-Aryan and Dravidian languages, including those of Odisha, at least those 
studied instrumentally to date which may in fact be limited to Oriya (Mahanta 2010) and Telugu (which 
is not an official language of Odisha but is spoken in Parlakhemundi in the extreme south of Odisha, 
on the border) typically are trochaic or first-syllable prominent in similar contexts having more 
peripheral vowels in the first syllable and more phonological contrasts attested, all pointing to first- 
syllable prominence (Khan 2016). 


6 Discussion 
The phonetic analysis supported by acoustic evidence above suggests that duration and intensity are 
largely consistent cues of prominence in the Assam lect of Santali when investigated across all three 
utterance contexts, viz., in isolation, in a phrasal frame and in an unaccented frame. In each instance, 
the second syllable reflects greater prominence with respect to these cues than does the corresponding 
first syllable in these disyllabic words. Fundamental frequency is also distinct and points to prominence 
on the second syllable over the first syllable in Assam Santali as well. However, unlike duration and 
intensity, fundamental frequency is only statistically relevant as an acoustic cue of prominence in 
phrasal and unaccented utterance contexts. In the isolation context on the other hand, fundamental 
frequency does not cue word-level prominence per se in the Assam Santali lect examined. 
Simultaneously, as the isolation context may potentially reflect word-level intonation as well as 
intonation of the full utterance leveltwo areas where pitch variation can well be distinct-there may be 
intonational parameters that operate on the full utterance level that interact with and potentially override 
word-level intonational or prominence cueing parameters. However, what these may be requires further 
research. Also, duration does not appear to show sensitivity to specific vowel types or consonant 
contexts in their functions as cues of prominence. Intensity differences between first and second 
syllables on the other hand appear to be sensitive to both the presence of high vowels and the presence 
of [k], [r] or [tf] in onset position, while fundamental frequency may show some sensitivity to 
consonantal environment, specifically the presence of an initial fricative [h] and affricate [t/]. 
Furthermore, that the phrasal frame data in our study—which is a potentially inherently a quasi- 
focus position—and the unaccented position frame—which is explicitly an out of focus—largely pattern 
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together with respect to the three acoustic cues of prominence suggest that focal intonation is not active 
per se in determining the distribution of the three examined cues of word-level prominence. 


7 Broader Munda perspectives 

That our data show that the Assam Santali lect under investigation showing second syllable prominence 
in disyllables may be surprising to some scholars, but others will note this might be expected due to 
previous analyses of West Bengal and Jharkhand Santali lects. Those who may find this surprising 
might be scholars who would either predict that Santali should be similar to data from other South Asian 
language groups such as the Indo-Aryan lects that it is in contact with, or they may assume this based 
on claims that have been made in previous investigations, not of Santali specifically, but hypotheses 
that are said to apply to all Munda languages, and thus Santali by extension. 

To be sure, one camp of scholarship has long asserted that Munda languages exclusively and only 
show trochaic prosodic patterns (i.e., first syllable prominence in disyllables). This is best and most 
succinctly encapsulated in the theory of ‘rhythmic holism’ put forth by Donegan (1993) and Donegan 
and Stampe (1983, 2004). They suggest that there was a one-time shift from iambic ‘rhythm’ to trochaic 
rhythm in Munda at the proto-Munda level, and that it was this prosodic shift that triggered a series of 
cascading changes that caused the wholescale typological restructuring of proto-Munda from isolating 
to agglutinative-synthetic and to verb-final syntactic structure, and so on, which resulted in modern 
Munda languages ultimately representing the mirror image of their Austroasiatic sister languages that 
remained in Southeast Asia. However, such claims have been refuted by various scholars in recent work 
on Munda, such as Horo and Sarmah (2015), Horo (2017a), Horo et al. (2020), Anderson (2015b), 
Anderson (2020) and Ring and Anderson (2018). 

As mentioned above, other scholars have made claims about the prosodic structure of Santali in 
print that do not align with our experimental findings, although it should be mentioned here that no 
acoustic or statistical data have been offered by any of the other scholars mentioned below with respect 
to the Santali data, so their claims must therefore be considered impressionistic and preliminary as a 
result. Thus, Neukom (2001:8) claims that in disyllabic stems, in Santali, “(s)tress falls on the first 
syllable; however, if the first syllable is light and the second heavy (iambic structure), stress falls on the 
second syllable”. Therefore, he suggests that Santali reflects a Quantity Sensitive system. Our data as 
presented above on disyllables do not support this view. Regardless of syllable shapes in the words, the 
pattern is always the same: the acoustic cues that indicate prominence in Santali disyllables, including 
intensity, fundamental frequency, and duration, conspire to make second syllables prominent in Santali 
disyllables over the first syllable in such words. This is also true regardless of whether the word appears 
in a quasi-focal frame or in an explicitly out of focus frame for all three examined cues of prominence, 
and in isolation as well for duration and intensity. 

Note as also mentioned above that not all previous researchers agree with Neukom’s take on the 
Santali data. Thus, Ghosh (2008:30) states “(s)tress is always on the second syllable of the word 
regardless of whether it is an open or a closed syllable”. Our instrumental acoustic and statistical data 
do in fact support Ghosh’s view, at least in uninflected disyllables. 

As the present investigation is just a preliminary study, we have for the time being limited ourselves 
to only examining disyllabic lexemes in this one lect, Assam Santali. Bodding’s (1922) musings on the 
topic of the system of prominence attested in Santali are rather involved. In short, casting things into a 
modern typology of prosodic systems or prominence assignment, he suggests that when taking into 
consideration all Santali words, including inflected forms of verbs which can create rather lengthy 
morphological strings (or g[rammatical] words), the system of prominence should be considered to be 
morpholexically specified and constrained. He suggests that there may be certain morphemes capable 
of bearing stress while others do not. Whether this will be verified instrumentally remains a task for 
future research. 

It may also be the case that there are mismatches between phono-prosodically defined word 
domains (such as prominence or vowel harmony, see Anderson, Horo and Harrison 2022) and larger 
morphological complexes (g-words) that may constitute more than one phono-prosodic ‘word’ or 
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represent phrases prosodically, even if functioning syntactically as units.’ Moreover, it appears that in 
Munda languages more generally there are morphological word elements or grammatical morphemes 
that appear to be more integrated into the phono-prosodic domain of roots and form single phono- 
prosodic words with such lexical hosts, but others remain outside of such word domains. Put differently, 
some grammaticalized elements may be affixes and others rather clitics,® some eligible to be assigned 
prominence for example and some that are not, and yet others that are variable in this regard. A task for 
our future research on Santali is to determine what the prosodic and morphotactic characteristics of the 
full range of inflectional elements are that may constitute Santali morphological word complexes and 
how exactly these interact (or do not interact) with the system of prominence assignment. Only once 
such a study is complete will we be able to definitively determine the entire system of prominence that 
characterizes this important Kherwarian Munda language. 

Santali is far from alone within Munda in presenting a confusing picture asserted in different 
publications about what the system of prominence might be. A wide variety of other, often conflicting, 
claims about the intonational structure of individual Munda languages have appeared in print. With few 
exceptions (e.g., Rehberg 2003 for Kharia) instrumental data are not used as the basis for the analysis 
offered, so the interpretations remain largely impressionistic. With respect to Rehberg (2003), she 
proposed that low pitch on the initial syllable in a disyllabic word of Kharia followed by a high pitch is 
what signalled prominence, which while of course possible, seems largely motivated by a desire to 
conform the attested data in Kharia with the standard view that Munda languages are trochaic, such that 
it must be low pitch that signals prominence in Kharia if this trochaic pattern is true. We reserve further 
consideration of the Kharia data until we have had a chance to subject the data to our own instrumental 
analysis, but simply comment that the language seems to have a five-vowel system of phonemes and 
the realization of the name of the language has a schwa in the initial and allegedly prominent syllable, 
even while the schwa does not appear to be a phoneme in the language. Furthermore, intensity may 
increase concomitant with the raising of pitch in second “unaccented” syllables in Kharia (Peterson 
2011) further underscoring that Kharia likely has second syllable prominence in disyllables, not first- 
syllable prominence. 

Mundari, a language closely related to Santali, embodies the lack of clarity about the system of 
prominence that typifies our present understanding of Munda languages as expressed in print. Even 
whether ‘stress’ exists per se is debated, as Osada (1992:36) considers Mundari to be a pitch accent 
language, while Cook (1965:100), Langendoen (1963:14-15), and Sinha (1975:39), consider Mundari 
to be a stress language. But these latter three researchers do not agree on what the system of stress is. 
Sinha considers the language to have a quantity sensitive system whereby disyllabic words of the shape 
C!V'C’°V? or C!V'C?V?C3 (where C? can also be a homo-organic nasal+stop sequence) stress the second 
syllable, but in disyllabic words of the shape C'V'C’C°V’, stress falls on the initial syllable and in 
trisyllabic words, stress falls on the 2nd syllable regardless of the shape. Cook (1965) states that only if 
the final syllable is closed, it is accented, otherwise it is the initial syllable in disyllabic words. Osada 
(2008:104) states that if a word is trisyllabic, stress can only be on the second or the third syllable: on 
the third syllable if that is not a suffix, otherwise it falls on the second syllable in Mundari trisyllabic 
words, but never on the first syllable, regardless of syllable weight. 


7 This was suggested about Munda and Khasian language to explain some observed differences between 


apparent phrases vs. lexemes in Khasian Pnar and Sora (Ring and Anderson 2018). 

Santali clearly reflects such a system with subject markers which function as clitics in both imperfective and 
perfective series of inflections. Subject clitics in Santali do not even preferentially target the verbal ‘word’ as 
their host, but rather target the word immediately preceding the verb, or they may appear at the end of the 
morphological verbal word (or on occasion both places simultaneously). See Anderson (2007, 2015a, 2015b, 
2020) for more details. 
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Table 2: Munda language prominence patterns and acoustic cues (based on Hildebrandt and 


Anderson 2021) 
Language Prominent Syllable Acoustic Cues/Notes 
Sora Second syllable in disyllables Duration, Intensity, 
(Horo 2017) Fundamental Frequency 
Gorum Penult/final syllable Not yet formally investigated 
(Anderson & Rau 2008) 
Gutob Final/second syllable (Vo8 p.c.) H-Pitch, other cues unclear. 
Not yet formally investigated 
Remo Second syllable (Bhattacharya 1968) Intensity, Pitch, Duration? 
Not yet formally investigated 
Gta? Second/final syllable Not yet formally investigated 
(Anderson in preparation) 
Kharia Conflicting: Initial/second/final? L-Pitch (Initial) H-Pitch (Second/final) 
(Rehberg 2003, Peterson 2011) Intensity on non-initial syllables. 
Not yet formally investigated 
Juang Conflicting: Second vs. initial Not yet formally investigated 
(Patnaik 2008, Dasgupta 1978) 
Kortowa Second/last syllable of stem Duration? 
(Barker 1953) Not yet formally investigated 
Ho Conflicting: QS/initial or Not yet formally investigated 
Morpholexical 
(Nottrott 1882, Pucilowski 2013) 
Mundari Conflicting: Pitch accent vs. Pitch, other cues? 
Stress accent Not yet formally investigated 
Conflicting: Initial vs 
second vs. Morpholexical 
(Osada 1992, 2008; Cook 1965; 
Sinha 1975; Hoffmann 2001; 
Langendoen 1963) 
Santali Conflicting: Second syllable vs. Intensity, Duration, Pitch to some degree 
QS/initial (Bodding 1922, 
Neukom 2001, Ghosh 2008) 
Korku Second/QS? (Zide 2008) Not yet formally investigated 


However, all this aside, our data clearly shows that in disyllables, prominence is found on the second 
syllable in the Assam Santali lect we discuss here. Therefore, Santali appears to reflect the same pattern 
that has previously been identified for both Assam and Odisha lects of the distantly related Sora 
language, also of the Munda family (Horo and Sarmah 2015, Horo 2017a, Horo et al. 2020). Given the 
typological, geographic and genetic distance between Santali and Sora, and given strong areal 
tendencies against second syllable prominence in South Asia, and the fact that most of language groups 
that Munda is related to phylogenetically within the Austroasiatic phylum show a similar (and 
seemingly) cognate system of prominence assignment, one might be tempted to suggest that these data 
may point us in the direction of assuming that proto-Munda may well have been second syllable 
prominent in disyllables as the simplest explanation for the observed parallels between Sora and Santali, 
and impressionistically, other Munda languages as well, such as Gta?, in addition to their demonstrable 
similarity to other Austroasiatic groups. 


8 Future Research Goals: Towards an intonational typology of the Munda languages 

In this paper we are beginning the first step on a long journey to compare the interface between 
phonological structure and prosodic features and the morphosyntax of the Munda languages. First, we 
must determine what the patterns and cues of prominence for each of the languages are, or at least a 
representative set of the languages. From basic uninflected words, we expand this typology for inflected 


315 


Papers from SEALS 30 — Horo and Anderson 


forms of words and phrases and see what patterns emerge and how these different elements combine 
and whether such combinations exhibit distinct phono-prosodic patterns. This larger work is underway 
for Sora currently and will expand to Santali in the next year. We will extend this similar investigation 
to other Munda languages representing different branches of the family tree with an eye to not only 
grounding future discussions of Munda prosodic structures in instrumental phonetics analysis supported 
by acoustic evidence, but also how these structures interface with the complex morphosyntax of the 
languages. 


9 Summary 

None of the previously mentioned scholars relied on instrumental data for their analyses, which 
therefore remain impressionistic. Our study is based on instrumental analysis and suggests that Santali 
(at least as spoken in Assam) always shows prominence on the second syllable of disyllables, cued by 
intensity, fo and duration. This suggests that Assam Santali shows a pattern similar to that attested in 
Assam and Odisha lects of Sora (Horo and Sarmah 2015, Horo 2017, Horo et al. 2020), contra Donegan 
and Stampe (2004). In other words, it is an iambic pattern and not quantity sensitive, at least in 
disyllables. 

However, resolving what possible historical development the Munda languages have undergone 
both individually and at the proto-language level remains several steps away as we must first engage in 
the systematic synchronic analysis of word, phrase, and utterance level prominence systems (and 
subsets thereof) for the attested languages of the family before we have an adequate empirical basis to 
engage in such far reaching but important questions. The present study is just a preliminary step on this 
journey. 
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Abstract 

This paper presents an acoustic analysis of fundamental frequency (FO) perturbations 
conditioned by voiceless and preglottalized nasals in the speech of 20 speakers of a 
phonologically conservative, non-tonal variety of Eastern Khmu (Kmhmu’ Am). Broadly 
speaking, FO of vowels following voiced nasals is similar to FO following voiced plosives, 
but during the closure phase, FO is much lower for the voiced obstruents than for the voiced 
sonorants. FO following voiceless obstruents is initially perturbed upwards, but quickly 
converges to the intonational baseline. The effect of voiceless nasals on FO is comparable 
to, or even greater than, that of voiceless obstruents. The effect of preglottalized nasals on 
FO is similar that of the voiced nasals, but individual speakers vary considerably in this 
regard. These findings clearly illustrate the phonetic basis for the patterning of voiceless 
sonorants in tonogenesis and tone splits. 


Keywords: Khmu, voiceless sonorants, preglottalized sonorants, FO, tonogenesis, 
phonetics 
ISO 639-3 codes: kjg 


1 Introduction 

The term sonorant subsumes vowels, nasals, liquids, and glides, i.e., those sounds that are produced 
with a continuous, non-turbulent airflow in the vocal tract. Sonorants are typically voiced, with one 
study estimating phonologically contrastive voiceless sonorants occur in just 5% of the world’s 
languages (Maddieson 1984a). At least within East and Southeast Asia, however, voiceless sonorants, 
or at least voiceless nasals, are more common: of the 61 languages with voiceless nasals in the 
PHOIBLE database (Moran & McCloy 2019), 23 are spoken in East or Southeast Asia. Voiceless nasals 
in Southeast Asia typically consist of two distinct phonetic phases: a period of voicelessness 
accompanied by nasal airflow (aspiration), followed by a short, sonorous voiced portion (Dantsuji 1984; 
Bhaskararao & Ladefoged 1991). 

Sonorants may also be realized with an accompanying glottal constriction. When this constriction 
precedes the sonorous portion (as opposed to being coextensive with it), these may be referred to as 
preglottalized sonorants. According to surveys (Ruhlen 1975; Maddieson 1984a; Moran & McCloy 
2019), they are similarly rare as phonologically contrastive segments; PHOIBLE lists just 11 instances. 

The importance of voiceless (and to a lesser extent, preglottalized) sonorants in the processes of 
tonogenesis and tone splitting has been remarked on by numerous scholars (Haudricourt 1961; Matisoff 
1973; Chen 1992; L-Thongkum 1992, 1997; Hyslop 2009; Pittayaporn & Kirby 2017; Michaud & 
Sands 2020). It is well established that it is almost always the historical voicing status of a segment, 
rather than whether it is an obstruent or sonorant, that predicts how onsets behave in tone splits. A 
classic example is the evolution of Sgaw Karen (Haudricourt 1961), in which two tones later split into 
four under the influence of the laryngeal specification of the initial consonant: modal voiced stops and 
voiced sonorants conditioned a low register, while preglottalized voiced stops, voiceless aspirated and 
unaspirated stops, and voiceless sonorants conditioned a high register. Slightly more complex is the 
example of Dong (Kra-Dai), where an earlier system of 3 tones (A, B, C) was split three ways, but here 
again, the behavior is clearly conditioned by onset voicing: preglottalized sonorants and plain voiceless 
stops conditioned high register tones, voiceless continuants and aspirated stops conditioned mid register 
tones, and plain voiced sonorants, plosives, and fricatives conditioned low register tones. A 
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contemporary example is that of Cao Bang Tai, in which the historical contrast between voiced and 
voiceless sonorants has been lost (neutralized to the voiced versions), and it is only following sonorants 
that all six lexical tones are found (Hoang Van Ma 1997; Pittayaporn 2009; Pittayaporn & Kirby 2017). 
Many similar examples are documented in Haudricourt (1961). 

Data like these are what led Pittayaporn (2009) to propose a model of tone splitting in which 
voiceless sonorants play a critical role. In Pittayaporn’s model, there is a stage of categorical but 
redundant pitch registers based solely on voicing (Stage ID), followed by the development of phonemic 
register in the sonorants when the voicing contrast is lost in the sonorant sub-system only (Stage II). A 
phonetic corollary of this model would seem to be that voiceless and preglottalized sonorants should 
have pitch-perturbing properties similar to those of the obstruents they pattern with in historical tone 
splits. 

It is well established that phonologically voiceless obstruents tend to raise FO on the following 
vowel, while phonologically voiced obstruents either have no effect or, in some instances, can result in 
lowered FO (House & Fairbanks 1953; Lehiste & Peterson 1961; Hombert 1978; Kohler 1982; Ohde 
1984; Kingston & Diehl 1994; Hanson 2009; Kirby & Ladd 2016; Coetzee et al. 2018; Kirby 2018; 
Gao & Arai 2019; Kirby et al. 2020). However, there is much less work on the pitch-perturbing 
properties of voiceless and preglottalized sonorants. What little phonetic work exists on voiceless nasals 
has focused almost exclusively on Tibeto-Burman languages, primarily Burmese (Dantsuji 1984; 
Maddieson 1984b; Bhaskararao & Ladefoged 1991; Chirkova, Basset & Amelot 2019). Those studies 
that have looked at FO (Dantsuji 1984; Maddieson 1984b) have found a dichotomy similar to that seen 
in plosives in other languages (i.e., FO is higher following voiceless sonorants compared to voiced 
sonorants), but these studies did not compare the effects of voiced and voiceless sonorants on FO with 
those of obstruents, only to one another. Moreover, since Burmese is a tonal language, the co-intrinsic 
FO effects may well be attenuated (Hombert 1978; Francis et al. 2006; Kirby 2018). What is ideally 
required is a non-tonal language which has the full complement of onset types: voiced, voiceless 
unaspirated, and voiceless aspirated plosives, along with voiced, voiceless, and preglottalized sonorants. 


2 Eastern Khmu (Kmhmw’ Am) 

Luckily, there exists at least one such language: Kmhmu’. Kmhmu’ (also Khmu, Kammu, etc., cf. 
Proschan 1997) is an Austroasiatic language with several distinct varieties, with a total of around 
700,000 speakers primarily in Laos, Thailand, China and Vietnam. Kmhmu’ is fairly well studied, due 
perhaps in part to the relative and sustained vitality of the language, but also because of its importance 
in understanding Austroasiatic more generally. Cheeseman et al. (2017) provide an extensive, although 
not exhaustive, bibliography. 

Kmhmw’ varieties are typically divided into two types, distinguished primarily in terms of lexicon 
and phonology. The most striking difference is the existence of one set of dialects, termed ‘Southern’ 
in the terminology of Lindell et al. (1980; 1981) and ‘Eastern’ in the terminology of Suwilai Premsrirat 
(1987; 1999; 2001; 2004), which retain a rich initial consonant inventory including voicing oppositions 
for both stops and sonorants. In the other main type (‘Northern’ for Lindell', ‘Western’ for Premsrirat), 
the laryngeal contrast of the initial consonants has been lost and restructured as a contrast of vowel 
phonation, tonality or a combination of the two (although no acoustic evidence of a variety 
incorporating phonation type contrasts, primarily or secondarily, is known to exist). The so-called 
‘tonal’ varieties of Kmhmu’, in which onset FO perturbations have been phonologized, have received 
more attention from phoneticians (Suwilai Premsrirat 1999; 2001; 2004; Svantesson & House 2006; 


Svantesson and colleagues (Svantesson 1983; 1989; Svantesson & House 2006) sometimes use the term 
‘Northern’ to differentiate between allegedly tonal dialects which retain an aspiration contrast (e.g., yuan) 
versus those which apparently do not (e.g., rdak). In Svantesson et al’s “Western Khmu’, the voiced stops of 
the Eastern varieties merged with the voiceless unaspirated series; in ‘Northern Khmu’, they merged with the 
aspirated series. In both cases, however, the old voiced series is distinguished by a lower vowel pitch (with 
supposedly redundant aspiration in the Western varieties). 
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Abramson, Nye & Luangthongkum 2007); however, the acoustic properties of the voiceless and 
glottalized sonorants, retained in the more conservative varieties, have not been carefully studied. 

This work focuses on a phonologically conservative Southern/Eastern variety spoken in and around 
Vientiane, Luang Prabang, Xiang Khouang, and Bolikhamsay provinces in Laos, referred to by our 
consultants as Kmhmu’ Am /k*mu? ?am/ (/?am/ being the negative particle shibboleth). In other sources, 
this variety is sometimes known as Kmhmu’ Uu or Kmhmu’ Cwang. Osborne (2018) provides a 
thorough and detailed overview of the phonology of this variety, which features 36 consonants /p' p b 
ttdec3yk*kg?m*mmnnnppngynynw'wwj¥ jshlirr/, of which only 15/ptck?mnpy 
w jj h1r/ occur as codas; 10 monophthongs /i e ¢ a3 9+u0 9/, all of which contrast for length; and 3 
diphthongs /ia ia ua/. 

The conservative phonology of Kmhmu’ Am provides a natural laboratory in which to study how 
the FO differences between the voiced, voiceless, and preglottalized sonorants compare to the magnitude 
of the differences between voiced and voiceless obstruents. Eastern Khmu thus affords us a window 
into the phonetic structures of a language that may reflect the ancestral state of many of the tonal and 
registral languages of modern Southeast Asia. 


3 Methods 

The current study compares the FO profiles of voiceless and preglottalized sonorants with those of 
voiced sonorants and voiced and voiceless obstruents. Although the primary focus is on the sonorants, 
it is important to provide an analysis of the obstruents to understand whether, and how, the different 
segment types affect FO in different ways. We expect to see lowering of FO during the closure for voiced 
obstruents, but tracking the trajectory of voiced sonorants thereafter, as well as raising of FO following 
the release of voiceless (aspirated and unaspirated) obstruents, relative to the voiced sonorant baseline. 
Based on their typical diachronic patterning, we expect both voiceless and preglottalized sonorants will 
raise FO in a manner similar to that of voiceless obstruents. 


3.1 Participants 

This study is based on recordings of 25 speakers (14 female and 11 male, ages 21-69) made in a 
Kmhmw’ village in Vientiane in January 2020. As 4 of our consultants were primarily speakers of a 
slightly different variety (Kmhmu’ Pee), and the recordings of one older male speaker were of 
insufficient quality, the findings reported here are based on a subset 20 speakers (12 females and 8 
males). All consultants were also fluent in Lao to varying degrees, but all were native speakers of 
Kmhmw’ and spoke Kmhmu’ daily as their primary language. 


3.2 Procedure 
Speakers were recorded reading a list of 125 words, which they produced four times: twice in isolation 
and twice in a carrier phrase /?0? ca law ?an kloh/ (1sg IRR speak ____ SBJV clearly) “I will say 
clearly”. Participants produced the Kmhmu’ form in response to an oral prompt of the Lao gloss 
by an experimenter; some participants who were literate in Lao were able to read the glosses themselves. 
Prior to recording, Kmhmw’ assistants went over the Lao glosses with each participant, so they were 
comfortable with the procedure, and familiar with the Kmhmu’ lexical items of interest. Recordings 
were made direct to disk using the SpeechRecorder software (Draxler & Jansch 2004) with a headset 
condenser microphone in a quiet, sound-treated booth. A simultaneous EGG signal was also recorded 
from most speakers, and used to assist in the segmentation, but is not analyzed here. 

The present paper focuses on a subset of the full list, consisting of 59 items with long vowel nuclei 
and one of 15 onsets (see Appendix): voiceless plosives /p t k/, (pre)voiced plosives /b d g/, aspirated 
plosives /p" t" k*/, voiced nasals /m n/, voiceless nasals /m n/ and preglottalized nasals /*m *n/. We also 
recorded examples of voiceless laterals and approximants, and of the velar nasal /n/, but the lexical 
items with these onsets have either short vowels or, in the case of the lateral, we could not find a suitable 
lexical item with the corresponding voiced onset for comparison. 
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3.3 Segmentation 

Target syllables were manually segmented and stored as an EMU speech database (Winkelmann, 
Harrington & Jansch 2017). Annotations were made on two tiers. The first tier was used to indicate the 
presence or absence of a constriction in the supraglottal vocal tract. In the carrier phrase contexts, this 
was straightforward: for plosives, the closure phase was the period of silence preceding the release 
burst; for sonorants, this was either the sonorous nasal portion (for voiced nasals) or a period containing 
both silence/frication noise followed by a periodic nasal portion (for voiceless and preglottalized 
nasals). For utterances produced in isolation, no closure phase could be annotated for the voiceless 
plosives; here, the visible duration of prevoicing was taken as an indicator of the closure phase. 
Similarly, for preglottalized nasals in isolation context, the beginning of periodic vibration was taken 
as the onset of the closure as measured (Figure 1). For voiceless nasals in isolation context, the 
beginning of the closure phase was deemed to be the onset of visible high-frequency frication in the 
spectrogram (Figure 2). In all cases, the onset of the open phase was assessed as either the plosive 
release burst, if present, or the onset of periodic formant structure with a clear second formant. 

The second tier was used to indicate the onset of periodic vocal fold vibration (for plosives) or the 
onset of periodic nasal murmur (for nasals). Particularly for /m n/ in carrier phrase contexts, vocal fold 
vibration would often be visible throughout the closure phase, but this was almost always acoustically 
distinct from a ‘true’ nasalized portion, identifiable by increased waveform amplitude and the presence 
of faint formant structure (Figure 3). 

For voiced plosives, voicing would sometimes be present during the closure, but either die off prior 
to the release burst, or be cut off by the burst, and there could then follow a brief period of voicelessness 
before the onset of the vowel. In these instances, three points were annotated: voicing onset during the 
closure phase; voicing offset during the closure phase; and voicing onset during the open phase (Figure 
4). This permitted measurement of both the duration of prevoicing and the post-release voicing lag time 
within the same syllable. 

Sonorant codas were segmented when present, but in the following analysis, vowel and sonorant 
codas are treated as a single unit, since exploratory data analysis suggested that FO excursions on 
syllables like /da:n/ were timed similarly to those on syllables like /da:/. 


3.4 Analysis 

After segmentation, FO in the target items was measured at 5 msec intervals using the ksv £0 estimator 
in the wrassp package (Bombien, Winkelmann & Scheffers 2021). Raw Hertz values were 
transformed to semitones for each speaker by scaling each value x by 12 logs x/us, where ys is that 
speaker’s mean FO value. These were used to estimate the mean FO for the 50 milliseconds preceding 
the closure release (for segments with at least some closure voicing) as well as the first 50 milliseconds 
immediately following the release. 
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Figure 7: Example of segmentation of a preglottalized nasal in ’mo:n/ 3eH DW ‘place, area’, 


Frequency (kHz) 


citation form context, speaker F 13. 


F13-0024-iso-rep1-qmOOn-28 


O-NWABH 


0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 
Time (s) 


Figure 8: Example of segmentation of a voiceless nasal in /mpa:n/ 6S) ‘to bury’, citation form context, 


Frequency (kHz) 


speaker F12. 


F12-0022-iso-rep1-hmaan-113 


sdasststisis, Hiovatilottes 


O-NWAO 


Time (s) 
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Figure 9: Example of segmentation of a voiceless nasal in /no:n/ ®9 ‘still, yet’, carrier phrase 


Frequency (kHz) 


context, speaker F13. 


F13-0024-car-rep2-hnOON-235 


0.5 0.6 0.7 0.8 
Time (s) 


Figure 10: Example of segmentation of a voiced plosive in /go:y/ cen) ‘soup’, citation form context, 


Frequency (kHz) 


speaker F'12. 


F12-0022-iso-rep2-gO0ON-142 


Or-?NWA OI 
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0.3 
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4 Results 


4.1 General patterns 

The smoothed conditional mean FO trajectories? (in semitones) for the 100 milliseconds preceding the 
closure and the 100 milliseconds following are shown in Figure 5, faceted by context (isolation versus 
carrier) and manner (nasal versus obstruent). Trajectories are plotted in relative time, rather than 
normalized time, because the normalized time comparison would mask the considerable durational 
differences between the sonorous portions of the voiceless and preglottalized sonorants compared to 
the voiced nasals. Estimates of FO (in semitones) averaged over the last 50 milliseconds of the closure 
and the first 50 milliseconds following closure release, where the differences are most pronounced, are 
given in Tables 1 and 2. 

Several major trends are apparent from these figures. First, there is considerable lowering of FO 
during the closure phase of the voiced obstruents, relative to voiced nasals (see also Figure 6). Second, 
FO is raised in voiceless nasals both during the nasalized closure portion as well as during the following 
vowel. This difference persists for at least the first 100 msec in isolation forms; in the carrier phrase 
context, the effect is clearly attenuated, but there is still a mean difference of around 1 semitone 
compared to the voiced and preglottalized nasals in the region around the closure release. As highlighted 
in Figure 6, in the citation forms, FO following voiceless nasals is slightly higher than following 
voiceless obstruents, but this effect disappears in the carrier phrase context. 


Figure 11: Mean FO trajectories (GAM smooths) over last 100 msec of closure and first 100 msec 
post-release for nasals and obstruents by utterance context. 


= voiced = = voiceless = glottal aspirated 
nasal obstruent 
2 
— -” 
-2 
6 -4 
=| 
i) 
= 
& 
oO 9 
m2 
0 g 
5 
-2 g 
-4 
-100 -50 0 50 100 -100 -50 0 50 100 


relative time (msec) 


? FO trajectories were smoothed by fitting a generalized additive model (GAM) using a cubic spline basis with 
shrinkage and 6 evenly spread knots; see Wood (2006) for further details. 
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Figure 12: Mean FO trajectories (GAM smooths) over last 100 msec of closure and first 100 msec 
post-release for voiced plosives vs. voice nasals (left column) and voiceless plosives vs. voiceless 
nasals (right column) by utterance context. 


= nasal obstruent 
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2 
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Table 7: Mean and standard deviation of FO (in semitones) averaged over last 50 milliseconds 
preceding closure release. 


isolation carrier phrase 
place onset FO (mean) FO(SD) FO(mean) — FO (SD) 
b -2.64 2.05 -2.78 172 
m 0.92 1.64 0.35 1.60 
ona 1.05 1.69 6,98 1.27 
*m 0.23 1.46 -0.72 1.32 
d -3.24 2.20 -3.12 1.65 
ohana 1.56 1.79 0.55 1.76 
-0.59 1.75 -0.47 1.46 
0.02 1.50 -0.76 1.76 


velar g -3.19 2.48 -2.89 1.84 
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Table 8: Mean and standard deviation of FO (in semitones) averaged over first 50 milliseconds 
following closure release. 


place onset FO (mean) FO (SD) FO (mean) FO (SD) 
b 0.62 2.02 0.24 1.67 

m 1.04 1.28 0.30 1.16 

m 0.28 1.71 0.21 1.47 

Bae p 0.88 2.06 0.80 1.49 
p -0.04 2.81 0.05 1.08 

™m 0.66 1:23 0.49 1.10 

d -0.20 2.01 -0.30 1.62 

n ite 1.64 1.04 1797 

ee n 0.60 1.79 0.38 1.64 
*n 0.11 [12 -0.15 1.28 

t 1.16 2.26 0.93 1.74 

th 042 2.94 -0.06 2.05 

g -0.29 2.09 -0.16 1.50 

velar k 0.05 1.80 0.01 1.35 
ke 1.39 0.76 0.98 1.57 


4.2 Individual differences 

While space does not permit a full exploration of the individual patterns, a few examples should make 
it clear that not all speakers are homogeneous in terms of how their voiceless and preglottalized nasals 
perturb the FO trajectory. 

For most (n=15) speakers, preglottalized sonorants do not generally raise pitch during the closure 
or following it, relative to /m n/. For some speakers, such as M3 (Figure 7), pitch following /’m ’n/ is 
lower than that of the corresponding voiced nasals, at least in carrier phrase context; female speaker F7 
(Figure 8) shows a similar pattern. However, for the remaining 5 speakers, exemplified here by M2 and 
F4 (Figures 9 and 10), both voiceless and preglottalized nasals condition higher FO compared to voiced 
nasals, in isolation and in the carrier phrase, both in the (short) sonorous period during the closure, but 
in M2’s case extending well into the following vowel. 

The existence of these two types of speakers appears to be what gives the impression of the 
preglottalized sonorants being “intermediate” in terms of their effects on FO when we look at the group 
average in Figure 6. But actually, it seems there are two main groups of speakers — those for whom 
preglottalized sonorants pattern with the voiceless sonorants (and obstruents), and those for whom they 
pattern with the voiced sonorants (and obstruents). Thus, the claim that “voiceless unaspirated stops [in 
Eastern Kmhmu’] are phonetically stiff voiced, bringing them in-line with the glottalised sonorants” 
(Osborne 2018:71) may bear further scrutiny. 
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Figure 7: Smoothed mean FO trajectories for speaker M3 (male, age 34). 


== voiced == voiceless = glottal aspirated 


nasal obstruent 


AJSLLIVS 


\ 
iat 


-100 0 100 200 ~—--100 0 100 200 
relative time (msec) 


Figure 8: Smoothed mean FO trajectories for speaker F7 (female, age 32). 
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Figure 9: Smoothed mean FO trajectories for speaker M2 (male, age 38). 
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Figure 10: Smoothed mean FO trajectories for speaker F4 (female, age 37). 
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5 Discussion 

In this sample of speech from 20 speakers of Eastern Khmu (Khmhu’ Am), voiceless nasals condition 
FO raising relative to voiced nasals, both during the oral closure as well as following the release. This 
finding is consistent with earlier work on Burmese (Dantsuji 1984; Maddieson 1984b). In addition, by 
comparing trajectories (rather than just point estimates) and by analyzing obstruents as well as 
sonorants, the present work shows that the magnitude of the FO perturbation is at least on par with that 
conditioned by voiceless plosives, and in general, the temporal extent is greater, at least in isolation 
forms. 

The behavior of the preglottalized nasals with respect to FO was found to be more variable. For 
some speakers, FO seems to be raised following these segments, but for many, preglottalized nasals 
appear to pattern with voiced nasals in terms of their (lack of) effects on FO. It is worth mentioning that 
the preglottalized nasals in Eastern Khmu varieties correspond to “voiced, slightly implosive” stops in 
other Khmu varieties (Svantesson & Holmer 2014:963), and implosives are known to exhibit variable 
patterning with respect to tone both synchronically (Tang 2008) and diachronically (Haudricourt 1961; 
Gedney 1972; Hombert 1978). 

The present findings provide phonetic support for the proposal that tone splits might well begin 
with voiceless sonorants, at least in languages which have them. The effect of this class of segments on 
FO is at least as great as that of voiceless obstruents, and because there is no voice break between the 
(admittedly short) nasal and the following vowel, the FO contour is potentially even more audible 
compared to voiceless obstruents. Coupled with the fact that, in carrier phrase contexts, the aspiration 
of the voiceless nasal is likely to be difficult to perceive, this seems to provide the perfect conditions 
for the actuation of sound change (Ohala 1993; Janda & Joseph 2003). 

An important question to address in future work is to see whether the magnitude of the FO 
perturbation varies with the duration of the voiceless nasal. If FO is high even when there is voicing 
throughout the closure, this would suggest that the primary difference is one of laryngeal tension setting, 
not of the perceptually rather fragile presence versus absence of nasal frication. Indeed, it may well be 
the presence of a shared articulatory posture, rather than any particular acoustic effect, which drives 
voiceless (and in some cases, preglottalized) sounds to pattern together in tone splits. 

Also interesting is the extent to which FO is lowered during the closure for voiced plosives. This 
effect was very strong in the data examined here, suggesting a phonetic basis whereby prevoiced 
plosives might diverge from other segment types. Yet historically, voiced plosives usually pattern with 
voiced nasals in terms of tone splits. This patterning may be related to the fact that prevoiced plosives 
are often partially or completely devoiced, as seen in e.g., Chru (Brunelle et al. 2020) or European 
Portuguese (Pape & Jesus 2015), and when this happens, their post-release FO profiles are often 
indistinguishable from those of voiced nasals. The devoicing rate is one of several other acoustic 
properties, such as spectral balance, which bear on this question, and which should be examined in 
future work. 
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Appendix: Wordlist analyzed in present study 


IPA Lao Gloss 

ba: 659 (589) you (female) 

bar Jo) two 

bit peas) to put out fire, extinguish 
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bu: 
bu:c 
de:r 
do:m 
da: 
da:y 
da:l 
(hn)du:m 
go-1) 
ga: 
ga-1) 
ge:t 
gi: 
gu:m 
ma:n 
ma:r 
nen 
Qo: 
na:j 
ni 
ke:n 
ko:k 
ko:l 
ko:n 
ka:l 
ka:p 
ku:p 


k*o:1 
kha:l 
khy: 
kul 
ma:m 
mo:j 
mu:m 
m9:j 
(hr)no:m 
na: 
nu:m 
pel 
po:k 
pail 
pu: 
pu:c 
pu:l 
po;j 


2w9, weg. (comw) U9 


cS 


099, MOD (ccw) 


S2Ovg|Qo0v 
wo 


todajpRodnjw 


ve» 


che 
ccay, ccmdy 
Sy) 


uD 
baacceo 


290 
Hwy 
oen 
290 (ccs) 
Sys) 
4v0c59 
3 
UID 
Sv (ccnvSv) 


ccm s900N, cSo 


3 
3) 
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puffy; swollen 
liquor 
spread out (fishing net) 
beautiful, sweet, natural (sound) 
to apply, paint 
lizard 
dull 
ripe 
soup 
climb 
house 
to pour (water) 
here, this 
to winnow the paddy 
to bury 
salt 
hard, tight 
still, yet, remain 
that (dem.) 
debt 
tighten at the waist 
caterpillar 
to cut down 
offspring, child 
before 
chin 


to cook food by wrapping in a banana leaf and roasting 


in the fire 
whistle 


thin bamboo strip used for making baskets 


here (loc. Adv.) 
body hair 
blood 
one (numeral) 
to take a bath 
grease, oil 


thin bamboo strip used for tying things 


she (pronoun); rice paddy? 
urine 
catch and watch 
burn, grill over fire 
birthmark 
empty rice husk 
to take off clothes 
four 
fan, blow 
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p'a:n 29 to kill 
*mo:k ven, JoOv to tell, teach 
*mon VNIDW place, area 
‘nan a9 clf. for nets 
*narm cihosdu same size 
ta:j 518) older sibling 
ta:k HIOSON to spit out 
tin SD (2) to fall down (house, tree) 
tu: t3S.ouoS to falsely accuse 
tu:t chy, nn plant 
thark vencBendnw to strip bark off of tree 
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Abstract 

The Kanise Khumi language of the Khomic branch of Kuki-Chin is a typologically 
interesting case in exhibiting distinctive surface pitch patterns and vowel variation [i~3~a]. 
This paper provides a descriptive account of minor syllable in Kanise Khumi with acoustic 
analysis of a set of words that correspond to minor syllables in seven other Khomic 
varieties. The study aims to elucidate the phonetic properties of minor syllables in the 
language and investigate how they fit into the description of this syllable type in the 
Khomic context. Archived data (Bryant 2020) is analyzed in terms of the vowel duration, 
vowel formants (F1 and F2), and pitch (FO). The results showed that minor syllables in 
Kanise Khumi resemble minor syllables of other Khomic varieties to a considerable extent. 


Keywords: Kuki-Chin, Khomic branch, Khumi, minor syllable 
ISO 639-3 codes: cek 


1 Introduction 

Minor syllables typically have a reduced number of contrastive segmental features compared to major 
syllables, which have a more complex range. Moreover, the degree of phonotactic restrictions of minor 
syllable varies cross-linguistically, and the phonetic description of elements that fit this syllable type 
differs between languages in consequence (Thomas 1992; Butler 2014). As the non-final syllable of a 
sesquisyllable, this syllable type is prevalent in the Khomic family [Kuki-Chin, Peripheral Group 
(Peterson 2017)], which belongs to the Tibeto-Burman stock, in which sesquisyllables are well-attested 
(e.g., Matisoff 1973; Thomas 1992; Michaud 2012). Languages in the Khomic family take a more 
restrictive definition of minor syllable than the rest of sesquisyllabic languages. In previous descriptions 
of Khomic languages, minor syllables are described as underlyingly vowelless syllables, consisting of 
a single consonant. This interpretation implies that pitch and vowel quality are surface-level realizations. 
It is hardly surprising since pitch and vowel quality carry a low functional load and are not pertinent to 
phonological descriptions of minor syllables in Khomic languages (Herr 2011; Hornéy 2012; Peterson 
2019). 

In Kanise Khumi (hereafter, Kanise), a Khomic variety spoken by approximately 5,776 people in 
Chin State, Myanmar (Ikeda 2021), 1,124 out of 2,000 words in a collected wordlist contain purported 
minor syllable. Baleno (2020) conducted an impressionistic study on sesquisyllabic nouns in Kanise. 
She revealed that minor syllables exhibit high surface pitch variation from the conventional mid pitch. 
Vowel quality of minor syllables also vary as far as [i~3~a] from the conventional [9]. However, studies 
of related Khomic varieties do not report vowel and pitch contrasts in minor syllables (Herr 2011; 
Hornéy 2012; Peterson 2019). This discovery makes Kanise a typologically interesting case. 
Furthermore, it begs the question of to what extent does minor syllable in Kanise fits into the 
descriptions of minor syllables in the language family. 

With this in mind, this paper presents the first acoustic analysis of Kanise’s minor syllables. To 
avoid L1 and auditory-impression bias, a set of 113 words were chosen that are sesquisyllabic cognates 
with at least one of seven related Khomic varieties. Minor syllables were analyzed in terms of the vowel 
duration, vowel formants (F; and F2), and pitch (Fo). 

This paper serves as a pilot study and aims to provide a descriptive acoustic account of Kanise’s 
minor syllable. The results of the study give more direction for ongoing research on morphotonemic 
studies of the language. §2 presents basic information about Kanise (language classification and 
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phonology). §3 reviews previous descriptions of minor syllable in related varieties and on Kanise. §4 
proposes the research questions of this study. §5 discusses the methodology of the acoustic analysis. §6 
presents the results, while §7 provides a qualitative discussion of the results. Finally, $8 ends the paper 
with concluding remarks on the results, limitations and future directions. 


2 Language overview 


2.1 Phonology 
Kanise has a canonical word structure of [Ci(V1)(T1)]C2(C3)V2(N)T2. The parentheses represent 
optional elements. The first three elements, [C:(V1)(T1)], make up a minor syllable in a sesquisyllabic 
word. In minor syllables, only the obstruents / pt k ?s nmr ts / are attested in the C; slot. The 
phonological status of the minor syllable vowel (V1) and tone (T)) is currently under study. There is no 
constraint on the type of consonant that can occur as an onset in a full syllable (See Table 1 for the onset 
inventory). However, if an onset cluster is present, the first consonant (C2) is restricted to a voiceless 
plosive, and the second consonant (C3) a liquid: /r/ or /I/. 

Kanise has nine simple vowels and four diphthong vowels, as in Table 2. All open-mid vowels are 
slightly diphthongized with vowel height raising towards the end of their articulation. The language has 
three phonemic tones, as listed in Table 3, with one checked tone. 


Table 1: Onset inventory of Kanise 


p i Ke 
p t k ? 
b d 
S h 
Vv 
ts 
m i} 


=: 
= 


Table 2: Vowel inventory of Kanise 


Monothongs Diphthongs 
i i u ui au 
e oO 
€ [e°] 3 [3°] 9 [9°] oe ai 
a 


Table 3: Tonal inventory of Kanise (Ikeda 2021) 


No. Description Notation 
1 High Level Modal tone [55] 
2 Falling Low Breathy tone [31] 
3 Mid Glottalized tone [33°] 
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A Kanise word is minimally composed of a CVT structure. A coda is optional and is restricted to the 
placeless nasal (N) only. The nasal coda may impart nasalization onto the preceding vowel and may be 
optionally deleted. Table 4 lists all attested word structures in Kanise. On the whole, the phoneme 
inventory of Kanise can be considered as relatively symmetrical and resembles related Kuki-Chin 
varieties such as Mara (Luce 1985), and Kaang (So-Hartmann 1988), Lemi Chin (Herr 2011). 


Table 4: Attested word types in Kanise 


Word type Example Gloss Word type Example Gloss 
CV bul cooked rice CVN.CV louNJ ta:1 grindstone 
CCV krad deadfall CVN.CVN péN] peN4 dibble 
CCVN krouN4 trap N.CV ni nat one 
Ca.CV totpo:1 cave Ca.CVN.CV ?a1 deN1 tad to pound rice 
Ca.CVN kotleNJ cliff CV.CV.CV at h34 lis] single 
CV.CV 2a siz] star CV.CVN.CV_ | ?a1 p*oNi ta?d to pull out 
CV.CCV pal tlé:1 basket CV.CCV.CV ta:1 tl34 tad to weed 
CV.CVN tsoJ bat sheaf of grain N.CV.CV m1.63J ja4 to give birth 


2.2 Language classification 

Kanise Khumi speakers live in villages south of the Sami subtownship in Chin State, Myanmar (Joseph 
Bryant, p.c.). The language is associated with the language code [cek] “Eastern Khumi Chin,” which is 
an overarching code shared by other highly mutually intelligible dialects such as Asang, Khenlak, 
Khongtu, Lemi, Likhy (Eberhard et al. 2020). Alternative names for this language include Nideun 
(Eberhard et al. 2020), Tahaensae (or Taheunso) (So-Hartmann 1988), or Uiphaw, which are names of 
the Kanise Khumi-speaking clans (Nathan Statezni, p.c.). As the name suggests, the language may very 
well belong to the Khumi cluster group “Khumi” within the Kuki-Chin branch of the Tibeto-Burman 
family (Peterson 2017). With similar considerations of linguistic similarities, various scholars have 
proposed Khumi as being under the Southern Chin group (Grierson 1927; Benedict 1972; Shafer 1974; 
Peiros 1998) or other Chin groups (Bradley 1997). In light of phonological and morphosyntactic 
evidence, Peterson (2000) formulated the Center/Periphery model in classifying the Khumi cluster 
group. He adopted the term Khomic group (formerly known as the Southwestern group) as a first-order 
branching within the Peripheral group alongside the Southern and Northeastern groups (Peterson 2017), 
as shown in Figure 1. There has been no further classification proposed within the Khomic group. 


Figure 1: South Central Tibeto-Burman (Kuki-Chin) subgroupings (Peterson 2017) 


South Central | 


Tibeto-Burman 
(=Kuki-Chin) 
Northwestern : 
(old-kuki) Central Peripheral 
(various) Pr = al | Maraic Northeastern Southeastern | ( ots 
i i j « Mro- ; 
(various) (various) Khumi Rengmitca Khimi Lemi 
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3 Previous studies of minor syllables in the Khomic family 


3.1 Previous studies in related varieties 

There have been three descriptions done on minor syllables in the Khomic sub-branch: in Lemi Chin 
(Herr 2011), Mro Khimi (Hornéy 2012), and Bangladesh Khumi (Peterson 2010; Peterson 2011; 
Peterson 2019). In this language family, minor syllables function either as fossilized lexical morphemes 
from historical compounds or as grammatical morphemes. These formatives can be lexicalized in some 
cases (Peterson 2013). 

Minor syllables typically have a much shorter duration than major syllables, and the same holds 
for syllables in Khomic languages. Herr (2011) measured the vowel length of minor and major syllables 
across ten sesquisyllables in Lemi Chin. She found minor syllables to have an average duration of 
0.0565 seconds compared to 0.2152 seconds in major syllables, which is about four times the duration. 
As for vowel quality, all documented Khomic varieties congruously exhibit non-contrastive or 
predictable minor syllable vowels, even though phonemic central vowels are present in these languages 
and can be auditorily indistinguishable from minor syllable vowels. Lemi Chin has vowel phonemes /4, 
9, 6, a/, but among them [9 and 9] are also found to be surface realizations of minor syllable vowels. 
Moreover, all surface vowels [3, 9, ¢, 8, 9, 9] of minor syllables have similar vowel quality compared to 
the phonemic vowels in the central vowel range. Despite having phonemic centralized back vowels /tu/ 
[+ ~ ua], /s/ [y ~ 9], minor syllables in Mro Khimi are described as carrying a “surface vowel” that “has 
a much shorter duration than vowels in other syllables.” In Bangladesh Khumi, Peterson noted a short 
vowel associated with the sesquisyllabic structure. He labeled the vowel as “optional” in terms of well- 
formedness (Peterson 2011). This “optional vowel” is distinctive from the phonemic close-mid /9/ 
vowel. As a whole, Khomic varieties conform with the conventional view that the vowel quality of 
minor syllables is not phonologically specified. 

As observed above, the syllable structure of minor syllables is generally understood as an 
underlyingly vowel-less /C/. That is, “/C-/” in Lemi Chin, “/Ca-/” with “inserted schwa” in Mro Khimi, 
and “/C(v)/” with “optional vowel” in Bangladesh Khumi. Since there is no phonological nucleus 
present, minor syllables are considered non-tone-bearing or tonally irrelevant, and any surface pitch 
found is non-contrastive. It has been reported that Lemi Chin has a predictable mid-tone as the default 
tone of minor syllables (Herr 2011). Bangladesh Khumi, on the other hand, restricts minor syllables 
from occurring with all five lexical tones. In Mro Khimi, tone assignment in minor syllables is post- 
lexical and comes after vowel [9] insertion as a two-part rule (Hornéy 2012). This means vowel and 
tonal quality are as-yet-unspecified before rule assignment. 


3.2 Previous studies in Kanise Khumi 

Baleno (2020) conducted a preliminary study on purported sesquisyllabic nouns in Kanise Khumi. Her 
study revealed that surface minor syllables vowels span as far as [#~3~a] but are primarily realized as 
schwa [9]. Syllabic nasal minor syllables [?n] were also present in her study. In addition, she found that 
minor syllables with surface [a] vowel are longer in duration than other minor syllables. She concluded 
that this [Ca] form requires further study to elucidate its status as a minor syllable. In the author’s 
observation, the surface [3] vowel repeatedly occurs with minor syllables of the same morpheme. This 
morpheme has hitherto been speculated to be the (inalienable) possessive morpheme (Peterson 2011). 
It is also noteworthy that the [3] vowel is exceptionally similar in phonetic quality to the conventional 
[9] vowel and showed up in major/full syllables as a phonemic vowel (/3/). 

As for minor syllable pitch, Baleno suggested that minor syllables primarily carry a surface mid- 
pitch but show high pitch variations. Both surface pitches align well with the description of lexical mid- 
glottalized tone (M) and high-level modal (H) tone of Kanise (Ikeda 2021), that is, 143-186Hz of high 
surface pitch (averaged 162Hz) compared to 150-170Hz of lexical H tone, and 128-164Hz of surface 
mid-pitch (averaged 146Hz) to 133-158Hz of M tone. Thus, there are two possibilities: minor syllables 
are restricted to a default mid-tone, as with Lemi Chin, or that minor syllables may indeed have 
phonological variation, as with Mro Khimi. In sum, minor syllables in Kanise are a typologically 
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interesting case for an acoustic analysis. All Khomic varieties show slight variations in the phonetic 
properties of their minor syllables. 


4 Research Questions and Methodology 

The rest of this study addresses the following research questions: 

i. How does the duration of minor syllable compare to the duration of major syllables? 

ii. Does the tone of the major syllable affect the duration of the minor syllable? 

ili. Does the minor syllable surface vowel [a] share the same vowel space with the surface vowel [3]? 
iv. Is the pitch of the minor syllable affected by the pitch of the major syllable? 


Having presented the research questions, the following sub-section outlines the research methodology 
of the study as follows: §4.1 discusses the data source (collection method and subjects), §4.2 on the 
materials used (data selection and process), and §4.3 on the research procedure. 


4.1 Data collection 

The Kanise data used in this study is sourced from a 2,076-item wordlist provided by Joseph Bryant. 
The wordlist is designed based on the EFEO-CNRS-SOAS Word List for Linguistic Fieldwork in 
Southeast Asia (Pain et al. 2019). Bryant assembled this wordlist during a one-week elicitation session 
back in 2019. The informant for the wordlist was a native male speaker in his 50s, with a younger male 
speaker furnishing another 193 words. After the elicitation, Bryant has transcribed all 2,076 items in 
the wordlist and supplemented Kanise orthography. He also made recordings of every entry in the 
wordlist in WAV format with a sampling rate of 44.1 kHz. It is essential to point out that the variety 
elicited in the data cannot represent all Kanise-speaking clans, only the one spoken by the subject and 
his village. Also, background noises can be heard in some of the recordings but do not have a significant 
effect in causing speech features to be unrecoverable or heavily distorted. The 2,076-item wordlist data 
is archived in Zenodo and accessible upon request (Bryant 2020). 


4.2 Materials 

As Kanise is in an early documentation stage, there is no adequate knowledge of the semantic and 
phonological make-up of minor syllables in the language. Hence, a wordlist consisting of a set of 
sesquisyllabic cognates (n=113) was prepared. The wordlist is sourced from seven related Khomic 
varieties in which the sesquisyllabic word type has already been established. This minimizes the 
likelihood of using irrelevant study material for this paper. Each item in the wordlist is cognate with at 
least one of the seven related Khomic varieties (cf. Table 9). The details of the related varieties are in 
Table 5. 


Table 5: Eight Khomic varieties related to Kanise Khumi chosen in this study 


Given language name | Sourced from Spoken in Notes 
1. Bangladesh Khumi Peterson 2010, Ruma Bazaar, Bandarban Hill - 
2011, 2019 Tracts, Bangladesh 
2. Lemi Chin Herr 2011 Paletwa township - 
3. Mro Khimi Hornéy 2012 Paletwa and Rakhine townships | - 
4. Awa Khumi Luce 1985 Tamantha, west-southwest of Based on Saptawka 
Paletwa (1934) 
5. Khomi So-Hartmann Paletwa township - 
1988 
6. Ahraing Khumi Luce 1985 Kaletwa township - 
7. Khumi VanBik 2009 Ruma Bazaar, Bandarban Hill Based on Peterson 
Tracts, Bangladesh (2013) 
8. Khimi Shafer 1944 unknown, possibly Southern - 
Paletwa 
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Further interval segmentation and labeling of the sound files of wordlist were done using TextGrid 
creator and editor scripts ran on Praat version 6.1.39 (Boersma and Weenink 2021). All sound files 
were downsampled to 10,000 Hz when exported into TextGrid. The rhymes of major and minor 
syllables were segmented individually on a labeled tier with boundaries set of the beginning and end of 
the rhyme - the onset and offset of the second formant, respectively, where possible. The acoustic 
measures of the labeled segments were then further extracted in Praat using the PraatSauce script 
(Kirby 2020). Finally, extracted results were gathered into several ensemble files by rerunning the script, 
and graphical representations of the data were produced using the plotnine package (Kibirige et al. 
2021). It is to be noted that only the second repetition of the three was analyzed throughout the study. 
Besides, maximum formant values were set to 5,000 Hz for males and 5,500 Hz for females. 


4.3 Procedure 

This pilot study is designed to demonstrate the phonetic properties of Kanise minor syllable and 
investigate how they fit into the description in the Khomic context. Hence, the study looked at the 
acoustic output of duration, vowel formants (F; and F2) and pitch (F»). 

The first experiment examined the vowel duration of minor syllables as compared to that of major 
syllables. In this experiment, the tone carried by the corresponding major syllable is being held as a 
control variable since tone can affect the pitch and duration of neighboring syllables (Wang and Chen 
1994). Therefore, the wordlist is grouped into three categories: category H (high-level modal tone, 
n=36), category M (mid-glottalized tone, n=32), and category L (low falling breathy tone, n=47). In 
addition, the potential effect of the tonal categorization on the duration of the minor syllable was kept 
under observation. However, the primary interest of this study was the durational difference between 
minor and major syllable. 

The second experiment compared the vowel quality of the conventional surface vowel [9] versus 
the surface [3] vowel of the possessive morpheme to see whether they share the same vowel space or 
are acoustically distinct. Out of the 113 tokens in the wordlist, there were 17 tokens found of this 
morpheme (cf. Table 8 for wordlist of [3] tokens). Since the rate of syllable reduction differs between 
languages, it remains unclear whether the sesquisyllable in Kanise still allows vowel contrast in its 
minor syllable. Suppose [3] is a phonologically specified vowel quality. In that case, we would expect 
to see a higher F; and narrower range of F2 values. On the contrary, we would expect [9] to exhibit a 
wider range of formant values (Davidson 2006). It is worth mentioning that five minor syllable tokens 
in the wordlist contain vocalic elements that are distinctive in terms of vowel quality: three tokens of 
[a], one of [4], one of [o]. There are also three syllabic nasals [n], which are not a vowel quality altogether. 
This finding corroborates Baleno’s (2020) description of [4], [a], and [n] surface variation on minor 
syllables. Though it is not conventional to see minor syllables that carry such distinctive vocalic 
elements, they are indeed passable minor syllable materials in that: i) they form a weak-strong pattern 
(disyllabic iamb) with the following syllable and are shorter in duration (Butler 2014), and ii) they could 
be minor syllables that carry contrastive phonemes in their vocalic elements as per Thomas’ (1992) type 
Ill and IV minor syllables. 

Since this paper does not address the working definition of minor syllable in Kanise, they were 
included in the experiment for this study. However, they were kept in sight for being potential outliers 
in terms of vowel quality. In measuring vowel formants, only data from the middle 50% intervals of the 
minor syllable rime were measured then averaged. This is to rule out any potential extreme variance 
conditioned by neighboring consonants, especially glides and nasals, or glitches in formant 
approximation. 

The third experiment measured the average Fo contour of minor and major syllables across three 
tonal categories. We are interested to see how the Fo contour patterns over minor syllables and the 
stability of the contour. Suppose the Fo contour merely follows the trajectory of the major syllable pitch. 
In that case, this suggests that the minor syllable pitch is interpolated from context and might be 
phonologically unspecified (no pitch specification). On the other hand, a stable tone pattern that is not 
explicable by contextual interpolation may suggest a contrastive pitch target (West 2014) or a tonally 
irrelevant default pitch. In measuring pitch, the duration of the rimes of the minor syllable was 
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normalized relative to that of the major syllable. Therefore, the minor syllable is fitted between t = 0 
and t = 0.2 and major syllable between t = 0.2 andt = 1. 


5. Results 


5.1 Vowel duration 

When it comes to vowel duration, the mean vowel duration of minor syllables in all three tonal 
categories are not significantly different from each other, as evidenced by the boxplot in Figure 2 and 
the durational measurements in Table 6. 


Figure 2: Vowel duration of major and minor syllables in Kanise across three tonal categories 


Major syllable Minor syllable 


0.4 - Tone 


= 
met 
oe 


Duration (seconds) 


Table 6: Mean vowel duration of minor and major syllables in Kanise across three tonal categories 
and their durational ratio 


Tone of major syllable | Syllable type | Mean duration | Minor/ major 
H Minor 59.4ms 0.1638/1 
Major 362.6ms 
M Minor 71.6ms 0.3212/1 
Major 222.9ms 
L Minor 92.4ms 0.3344/1 
Major 276.3ms 
Average Minor 74.5ms 0.2593/1 
Major 287.3ms 


The tone carried by the major syllable seems to have a high correlation with the duration of the major 
syllable. For example, major syllables in the H category, which carry an unchecked tone (high modal), 
are the longest. On the other hand, major syllables in the M category are the shortest, having checked 
tone (mid-glottalized). However, in terms of the relationship between major syllable tone and the 
duration of minor syllable, they appear not to correlate. There is merely a 33ms difference between the 
longest (L category) and shortest (H category) minor syllable on average. 

In addition, as for the durational difference between minor syllables and corresponding major 
syllables, sesquisyllables in category H has the shortest minor syllable and longest major syllable among 
the three categories. This results in their greater difference in length (59.4ms/ 362.6ms) compared to 
category L, which has the least durational difference (0.0924/0.2763). As a whole, minor syllables have 
about one-fourth (~26%) of the duration of major syllables (0.2593:1). 
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5.2 Vowel quality 

This part of the study is concerned with the vowel quality of the Kanise minor syllable. As a whole, 
minor syllable vowels in Kanise have their F and F2 values distributed around the mid-central range, 
with the centroid positioned around 1400, 550, as demonstrated in Figure 3. 


Figure 3: F; and F2 distribution plot of the Kanise minor syllable vowel 
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Moreover, most of the extreme variants in the distribution plot are identified as tokens with distinct 
vowel qualities (the aforementioned [%, a, o] vowels and syllabic nasals in §4.2). These outliers are 
plotted in red in the diagram, with their values falling far away from the mid-central vowel range. In 
Figure 4, minor syllables that consist of the possessive morpheme [?3] (n=17) are contrasted with all 
other minor syllables [?a] (n=96). The result (Figure 4) suggests that, generally speaking, [?3] has 
slightly higher F values, but the difference is not striking. 


Figure 4: F; and F> distribution plot for surface [3] vs. surface [a] vowel in Kanise 
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The F; values of the surface [3] vowel can be seen scattered around the lower range of the surface [9] 
vowels. Furthermore, the centroid of the [3] vowel nearly approaches the average F; and F>2 values 
measured for the phonemic [3] vowel in major syllables (see §11.1 for measurements). Nonetheless, the 
distribution range is not significant enough to be drawn separately from the surface [9] vowels. This 
begs the question of whether the two categories are distinctive and whether [¢] and [9] should be treated 
as two separate vowel categories. 
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5.3 Pitch 

The last experiment was conducted to determine the predictability of pitch contours of minor syllables, 
and the result points toward two possible interpretations. Figure 5 illustrates the normalized pitch of 
minor syllables (labeled “ps”) and major syllables (labeled “ms”) across three tonal categories. In 
categories M and L, the surface Fo patterns suggest an interpolation of Fo values from context. The 
minor syllable Fo in these two categories shows a falling slope contour, with the degree of slope close 
to that of the following full tone. This continuation of contour suggests that the Fo patterns of minor 
syllables in these two categories may be an interpolation from the pitch specification of the following 
major syllable. 


Figure 5: normalized Fo contours of minor and major syllables of the three tonal categories in Kanise 
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Category H, on the contrary, does not reinforce this interpretation. When preceding H tone, the pitch 
pattern of minor syllables remains largely the same, exhibiting a mid-tone-like downtrend contour. In 
other words, the minor syllable pitch in category H does not anticipate the trajectory of the following 
full tone. When the three tone categories are collapsed together, the minor syllable pitch has a fairly 
consistent shape. The surface falling pitch starts from 150-155Hz (a difference of 0.568ST) and ends at 
151-143Hz (a difference of 0.942ST). This fits well within the contour range of lexical M tone (133- 
158Hz) (Ikeda 2021). 


6 Discussion 
The study results are consistent with the observations on minor syllables in other Khomic languages. 

First of all, the duration measurements of minor and major syllables indicate that the length of the 
minor syllables are roughly a quarter of that of the major syllables in Kanise. This finding conforms 
with Herr’s (2011) measurements of Lemi Chin’s minor versus major syllables and the general 
description of sesquisyllables being disyllabic iambs where the minor syllable is shorter in duration than 
the major syllable (Butler 2014). In addition, major syllable tones have no significant effect on the 
duration of the minor syllable, and the durational difference of the minor syllable across the three tonal 
categories seems trivial. 

As for vowel quality, acoustic evidence points to the fact that surface vowels [9] and [3] are not 
acoustically distinct. Though the values for [3] distribute in the lower area of the vowel space of [a], 
both vowels appear to share substantial vowel space. Therefore, it may be more plausible to assume the 
minor syllable vowel in Kanise to be phonologically unspecified or underspecified in vowel quality, as 
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reported in all related Khomic varieties (Peterson 2010; Herr 2011; Hornéy 2012). Nonetheless, a 
further token-by-token study can lend itself to validate this explanation. 

Lastly, it is found that the surface pitch patterns of Kanise minor syllables, generally speaking, are 
not subject to environmental influence and remain consistent across the three tonal categories. 
Following the same line of argument, the minor syllable pitch contours could be a “default” phonetic 
mid-tone, as per West’s (2014) underspecified or neutral tone. This type of tone is not derived from any 
lexical tone and lacks significant contrast. In such a case, the tone is not phonologically specified but 
inexplicable without any surface specification, as commonly attested in Chin varieties Lemi Chin (Herr 
2011) and Hkongso (Wright 2009) and more generally, among Tibeto-Burman languages (Morse 1963). 

An alternate interpretation could be the surface [3] vowel and mid-pitch of minor syllables as 
phonologically specified properties. Since the average F values for [3] are slightly higher than [9], one 
could posit [3] as a vowel with distinct phonemic status, not just an allophone. There exists a possibility 
that the short duration of the minor syllable, which correlates with the lack of stress, is preventing the 
[3] vowel from reaching full target attainment or distinctive vowel space from [9]. It is universally true 
that when a vowel occurs in weakly stressed syllables, the formant undershoots as the vowel duration 
decreases (Lindblom 1963). However, because minor syllable vowels are often neutralized due to vowel 
reduction, it makes the mid-central range a poor area for vowel quality contrasts. We would expect 
contrasting sounds to be perceptually distinct instead (Flemming 2004). Hence, this puts the [3] vowel 
in a disadvantaged position in functioning as a phonological vowel considering there is “no motivation 
to resist the pressure to assimilate to context” (Flemming 2009). Since the current results only touch 
upon this possibility, a closer examination is needed to elucidate its status. For example, surface [3] 
vowels, or minor syllable vowels in general, should be compared with the [3] vowel that has phoneme 
status in major/ main syllables in a more carefully designed experiment. In doing so, we could see if 
they are phonologically the same vowel placed in two different contexts. 

Furthermore, the minor syllable pitch may suggest a phonetic pitch target and thence a phonemic 
tone. One argument would be that the surface mid-pitch in minor syllable is the shorter stimulus of the 
lexical mid-tone. Ikeda (2021) discovered the lexical mid-tone to have an average Fo of 132Hz at the 
end of the rhyme. In contrast, in our experiment, the minor syllable’s mid-pitch ended with a 151-143Hz 
range. Therefore, we can propose that the minor syllable mid-pitch does not fully reach the end of the 
fall due to the limited time to complete the full approximation of tone, as with the minor syllable’s 
vowel. Having said that, the falling contour may not even be a distinctive cue of the lexical mid-tone, 
since not all entries of this tone occur with a “short-fall ending” (Ikeda 2021). This makes the minor 
syllable pitch more plausible to be a tonal phoneme if the falling ending is optional. However, an in- 
depth study is needed to confirm such a hypothesis. 


7 Conclusion 

The present study has provided a brief description of minor syllables in Kanise based on acoustic 
correlates of vowel duration, vowel quality, and pitch. The results have shown that minor syllables in 
Kanise fit into the description of minor syllables in the Khomic context to a considerable extent. 
However, it should be stressed that looking at these phonetic properties alone do not allow us to draw 
conclusions about the phonological status of minor syllables. This is because many factors can be 
influential, such as etymological origin, prosodic weight, and the morphological content of minor 
syllables. 

That said, the results of the study have demonstrated the need to provide a more restrictive working 
definition of the minor syllable in Kanise. We have observed several tokens of minor syllables with 
extreme variants of vocalic elements in the experiment on vowel quality. This has raised questions about 
their status as minor syllables, or more explicitly, the non-final syllables of sesquisyllables. Since 
sesquisyllables are a type of disyllabic iamb among other word types (e.g., extended monosyllables, 
near-disyllables), there can potentially be an extensive range of possible durations, vowel, and pitch 
distribution in a non-final syllable. Furthermore, these word types appear to lie on a continuum instead 
of discrete categories (Butler 2014). This motivates the need to explicate the exclusivity of minor 
syllables in this language. 
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It should be noted that this study has several limitations. The set of sesquisyllables that was chosen 
might not be representative enough for this study. A bigger set of data may shed new light. However, 
the study is by nature exploratory as it is based on only one speaker. Furthermore, the paradigm created 
for this study can also be improved, specifically in controlling the variables that have a potential effect 
on the experiment, such as the segmental context (lexical tone, potential boundary tone, and surrounding 
consonants). 

Although the research paradigm was not designed to address these factors, the study helps us better 
understand the phonetic properties of minor syllables in Kanise. 
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Appendix 

Table 7: F; and F2 values of the phonemic /3/ vowel in Kanise word-initial full syllables 

No Token Gloss IPA Fl F2 
1 nayn hryh sieve 3r3: oul 636.75Hz 1453.18Hz 
2 may yang to fish p3ljal 634.07Hz 1349.07Hz 
3 | qayn thyhn young (to be young) g3ith'y 556.79Hz 1438.75Hz 
4 pay qui warp (weaving) p3J yuil 558.11Hz 1406.04Hz 
5 pay ryh to draw in a casting net p3:1 cul 522.25Hz 1286.04Hz 
6 tlayn layn to join k13:1 13] 652.47Hz 1484.30Hz 
7 dayn cayn to stand on tiptoe d31 s3J 612.80Hz 1601.54Hz 
8 thayn kay smaller than t31 kod 650.44Hz 1567.03Hz 
Average: 602.93Hz 1400.72Hz 


Note: Measurements were taken across 8 tokens, at the midpoint of the rhyme. 


Table 8: Tokens in Kanise containing the possessive morpheme with surface [3] vowel in this study 


No. | Token Gloss IPA No. Token Gloss IPA 
0183 | y koung ‘trunk’ 2374 k6:1 1172 y ku ‘hand’ 234 ku 
0203 y pa ‘flower’ 234 part 1197 y phai ‘thigh’ 231 ptait 
0211 | y thai ‘fruit’ 234 thal? 1219 y hou ‘bone’ 234 howl 
0815 y hei ‘head louse’ 231 hed 1228 y thi ‘blood’ 234 this 
1098 | yqau ‘corpse’ ?31 yaul 1231 y nhan ‘flesh’ 234 9a: 1 
1101 y lu ‘head’ 234 lud 1256 | y hmyhn ‘bile, gall’ 234 mm3:1 
1105 | ysang | ‘hair of head’ 234 sa:1 2219 y hnai “pus” 231 nail 
1114] ymei ‘eye’ 23:1 med 2884 y ni ‘he, she, it’ ?31 nif 
1135) y haw ‘tooth’ 23:1 ha] 


Table 9: Experimental materials: sesquisyllabic cognates (n=113) across eight related Khomic 
varieties (White represents phonemic transcription; light grey represents phonetic transcription and 


dark grey represents orthography.) 


Bryant 2020 Peterson | Herr | Hornéy | Luce Hart- Luce VanBik Shafer 
2019 2011 2012 1985 mann 1985 2009 1944 
1988 
Kanise Khumi Bang- Lemi Mro Awa Kho-mi Ah- Khumi Khimi 
IPA English ladesh Chin | Khimi | Khu- raing 
Khumi mi Khumi 
potlou heart p liwng* | b.1iP° mlv? pele'n pdleu? | pliiwng blung 
koinol ear k’no* k.no! k-nd | kana? kend? kdno? knoo kaénaw 
koini] sun, day knit k.ni! k-ni kani knii kaning 
k61 tomorrow kh’ khoi?. kan 
da dang’ da! dang 
Qotroul six triw! t.rue tri’? triw taruk 
tsatpo:] son c’po! S.po* chapaw 
pollil four p liv b.1i3 peli pluee bili 
tolv3] bear tvéng! | t.v5! tvoceng 
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tsod daughter | c’niw’® | s.nt° chanu 
nnou?4 
polto:1 | to teach p tiw’ m.tu° matu 
pode to kill p doy? p°de'? pthueng madi 
poidaii | tosmooth |  p’lay* ponai 
tolhg! | topierce | t’héng? 
tothi to shake 
pot toopen | p’ewng? | p.?d! a(m)’ paawng 
Paul éwng 
polsg1 | toempty | p’kew 
intestine 
pottol to send p’to* mataw 
potlol to wake pla? 
sm up 
pot to spit p thay! m. petho!? ma 
toed thoi? thawi 
tolbat] mouth V’bewng! | t.b3° tben =| 15 beav® tabawng 
tod cemetery t.prte 
prati| 
tolpo:] cave t.pus tabu 
?31pa?d flower ?.pa3 pa’ paw 
m3Ir3J town mre 
kottu to work k.lu3 
toimoJ wound t.mo! | t-ma tmoo 
?3tthai? fruit ?.thai? otha’ athay athai 
231 val price ?.val avang 
tolvo] stream t.vo! tavaw 
tolvo:] bird t.voe t-va tavo tvoo tavaw 
st:] needle sh prt sprueeng 
pri 
tainaid | to listen t.gai’ tngay tangai 
?nidal | breakfast k.da! 
kodjaul belly ?.jo? 
23:1 eye ?.mi? ami? mei? mel° moey amih 
me 
234 thigh ?.prai? a? p‘e* pra: p‘ae> phay 
prail 
?3: Thod tooth ?. fo! a fa? ho ho? hoo afaw 
kolsel ginger k.thi! 
totkaiJ tiger t.kai! t? ka: tkaay 
kotha] fat (n.) ?.tha3 thaw? thaw 
?nidal | morming k.da! kandang 
soine] year s'.ni! sha nin sdning 
potyul rat bje* pyu buuy 
potyuil snake b.rui! p°vu:' pvuuy magui 
potttal fly m.t'a? | m-tho p*ttar pthaw 
potthi:1 comb m.the pthi 
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tolgil | intestines ti! tuyvii 
tolpo:] mush- t.po? t pho 
room 
kot door leaf k.tho? kathoh 
thaul 
kollail | monkey k.lai3 kolai 
soital | crocodile sh.ta3 
kos3J poison ?.she? 
toimail tail t.mai3 meet tma:' | tsmae’ | tmaay 
231 het louse ?. hi! hei? hoey 
pottsil saliva m.se! 
tole] to write trB tari 
kolduiJ egg k.dui! ke dui kduty 
polgol to steal b.ru3 pviiw magu 
?3dlud head ?.1e! a lu? lu lu? tut 
potlaiJ tongue b. lai! mle? prlai polae* plaay 
pole] shoulder b.1é plein pale? pleéng 
modo net b.lo° maloh 
poilaul | to vomit b.1o° pele" pleew 
potloel salt b.loi° peloi malawi 
podlati canoe b.15> | m-lo/ pleewng 
potigd cotton b.lo° plo 
pot?ol to bake b.203 
pottat poor m.ta° matang 
poite?4 to taste m.de! pte(ng) | mateng 
polnuil | to laugh m.nui! | m-n Ww penu'? pnuy manui 
podsuil | to whistle m.s'ui tpi 
pot to fill m.koi? 
koe 
tolpil squeeze t.pi! 
tol6atl | beak, bill t.b3° 
tolme:1 | to forget t.ma° tamang 
tol y3:1 | war spear t.ri° Pvi 
saikoel to hug t.ko7 tkawy 
tol?ail crab t.?aiP tv? aay 
tolhe1 | to stir up t.he! 
tothul scratch t.ho? Ppy"in 
sot fishhook s.koi! chakawi 
koe 
solb3] to help s'.b33 bawng 
sad thg] God sh.thg! sathaw 
soiral teacher s'.re® 
soik*i] sambar sh kh? skhii 
koldii | to empty k.di! 
kottu:1 | grandchild k.de® ktuu 
koinnil | blanket k.ni° kni 
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ko1 sil | to polish k.she! 

kolsail | elephant k.shai! kesha:i ksaay 

nisaul prawn k.sho! 

kodled weight kl 

kollé] cliff k.lé! 

ka tail debt k.lai! kalai 

ka laJ dance k.la! lan lang 

?al mil name ?.mP omin amueéeng | amiing 
234 bile, gall ?.m# mut’ 

mm3: | 

potto:1 lung m.to° ptaaw 

potke:] kidney m.kei? pkaawy 

231 tH echo ?.t#! 

234 thi: blood eer a t‘i? thy we thii athi 

?31 nil | he, she, it ?.ni! nl ani 

231 nail pus ?.nat! ane? nae? naay 

?3tyaul corpse ?.ro° ka? veu! veew 
231 colors ?.ro! 
nat | 

23484: 1 hair ?.shae as‘a? sham sam! saang dsang 

?3d4kud hand ?.ko3 a koe? ke"? kiw kevu> akuh 
23?4 trunk ?.kt> kiiwng akung 
k6:] 

234 na:1 flesh ?.na° ana?* yan nae! 
234 bone ?.hu? ayu?? he"? hevu* ahuh 
hout ; 

tolvo] stomach th-po/ | ayo? tdpom*? 

Pal sis] drug k-shi 

potsol to wash m-s hi p°s'i? psiiw mise 
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